├── .gitignore ├── EULA.pdf ├── README.html ├── README.md ├── requirements.txt ├── src ├── __init__.py ├── dataset.py ├── evaluation.py ├── features.py ├── files.py ├── general.py ├── sound_event_detection.py └── ui.py ├── task1_scene_classification.py ├── task1_scene_classification.yaml ├── task3_sound_event_detection_in_real_life_audio.py └── task3_sound_event_detection_in_real_life_audio.yaml /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | 55 | # Sphinx documentation 56 | docs/_build/ 57 | 58 | # PyBuilder 59 | target/ 60 | 61 | #Ipython Notebook 62 | .ipynb_checkpoints 63 | 64 | data/ 65 | system/ 66 | .idea/ -------------------------------------------------------------------------------- /EULA.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/EULA.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | DCASE2016 Baseline system 2 | ========================= 3 | [Audio Research Group / Tampere University of Technology](http://arg.cs.tut.fi/) 4 | 5 | *Python implementation* 6 | 7 | Systems: 8 | - Task 1 - Acoustic scene classification 9 | - Task 3 - Sound event detection in real life audio 10 | 11 | Authors 12 | - Toni Heittola (, ) 13 | - Annamaria Mesaros (, ) 14 | - Tuomas Virtanen (, ) 15 | 16 | Table of Contents 17 | ================= 18 | 1. [Introduction](#1-introduction) 19 | 2. [Installation](#2-installation) 20 | 3. [Usage](#3-usage) 21 | 4. [System blocks](#4-system-blocks) 22 | 5. [System evaluation](#5-system-evaluation) 23 | 6. [System parameters](#6-system-parameters) 24 | 7. [Changelog](#7-changelog) 25 | 8. [License](#8-license) 26 | 27 | 1. Introduction 28 | =============== 29 | This document describes the Python implementation of the baseline systems for the [Detection and Classification of Acoustic Scenes and Events 2016 (DCASE2016) challenge](http://www.cs.tut.fi/sgn/arg/dcase2016/) **[tasks 1](#11-acoustic-scene-classification)** and **[task 3](#12-sound-event-detection)**. The challenge consists of four tasks: 30 | 31 | 1. [Acoustic scene classification](http://www.cs.tut.fi/sgn/arg/dcase2016/task-acoustic-scene-classification) 32 | 2. [Sound event detection in synthetic audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-synthetic-audio) 33 | 3. [Sound event detection in real life audio](http://www.cs.tut.fi/sgn/arg/dcase2016/task-sound-event-detection-in-real-life-audio) 34 | 4. [Domestic audio tagging](http://www.cs.tut.fi/sgn/arg/dcase2016/task-audio-tagging) 35 | 36 | The baseline systems for task 1 and 3 shares the same basic approach: [MFCC](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) based acoustic features and [GMM](https://en.wikipedia.org/wiki/Mixture_model) based classifier. The main motivation to have similar approaches for both tasks was to provide low entry level and allow easy switching between the tasks. 37 | 38 | The dataset handling is hidden behind dataset access class, which should help DCASE challenge participants implementing their own systems. 39 | 40 | The [Matlab implementation](https://github.com/TUT-ARG/DCASE2016-baseline-system-matlab) is also available. 41 | 42 | #### 1.1. Acoustic scene classification 43 | 44 | The acoustic features include MFCC static coefficients (with 0th coefficient), delta coefficients and acceleration coefficients. The system learns one acoustic model per acoustic scene class, and does the classification with maximum likelihood classification scheme. 45 | 46 | #### 1.2. Sound event detection 47 | 48 | The acoustic features include MFCC static coefficients (0th coefficient omitted), delta coefficients and acceleration coefficients. The system has a binary classifier for each sound event class included. For the classifier, two acoustic models are trained from the mixture signals: one with positive examples (target sound event active) and one with negative examples (target sound event non-active). The classification is done between these two models as likelihood ratio. Post-processing is applied to get sound event detection output. 49 | 50 | 2. Installation 51 | =============== 52 | 53 | The systems are developed for [Python 2.7.0](https://www.python.org/). Currently, the baseline system is tested only with Linux operating system. 54 | 55 | Run to ensure that all external modules are installed 56 | 57 | pip install -r requirements.txt 58 | 59 | **External modules required** 60 | 61 | [*numpy*](http://www.numpy.org/), [*scipy*](http://www.scipy.org/), [*scikit-learn*](http://scikit-learn.org/) 62 | `pip install numpy scipy scikit-learn` 63 | 64 | Scikit-learn (version >= 0.16) is required for the machine learning implementations. 65 | 66 | [*PyYAML*](http://pyyaml.org/) 67 | `pip install pyyaml` 68 | 69 | PyYAML is required for handling the configuration files. 70 | 71 | [*librosa*](https://github.com/bmcfee/librosa) 72 | `pip install librosa` 73 | 74 | Librosa is required for the feature extraction. 75 | 76 | 3. Usage 77 | ======== 78 | 79 | For each task there is separate executable (.py file): 80 | 81 | 1. *task1_scene_classification.py*, Acoustic scene classification 82 | 3. *task3_sound_event_detection_in_real_life_audio.py*, Real life audio sound event detection 83 | 84 | Each system has two operating modes: **Development mode** and **Challenge mode**. 85 | 86 | All the usage parameters are shown by `python task1_scene_classification.py -h` and `python task3_sound_event_detection_in_real_life_audio.py -h` 87 | 88 | The system parameters are defined in `task1_scene_classification.yaml` and `task3_sound_event_detection_in_real_life_audio.yaml`. 89 | 90 | With default parameter settings, the system will download needed dataset from Internet and extract it under directory `data` (storage path is controlled with parameter `path->data`). 91 | 92 | #### Development mode 93 | 94 | In this mode, the system is trained and evaluated with the development dataset. This is the default operating mode. 95 | 96 | To run the system in this mode: 97 | `python task1_scene_classification.py` 98 | or `python task1_scene_classification.py -development`. 99 | 100 | #### Challenge mode 101 | 102 | In this mode, the system is trained with the provided development dataset and the evaluation dataset is run through the developed system. Output files are generated in correct format for the challenge submission. The system ouput is saved in the path specified with the parameter: `path->challenge_results`. 103 | 104 | To run the system in this mode: 105 | `python task1_scene_classification.py -challenge`. 106 | 107 | 108 | 4. System blocks 109 | ================ 110 | 111 | The system implements following blocks: 112 | 113 | 1. Dataset initialization 114 | - Downloads the dataset from the Internet if needed 115 | - Extracts the dataset package if needed 116 | - Makes sure that the meta files are appropriately formated 117 | 118 | 2. Feature extraction (`do_feature_extraction`) 119 | - Goes through all the training material and extracts the acoustic features 120 | - Features are stored file-by-file on the local disk (pickle files) 121 | 122 | 3. Feature normalization (`do_feature_normalization`) 123 | - Goes through the training material in evaluation folds, and calculates global mean and std of the data. 124 | - Stores the normalization factors (pickle files) 125 | 126 | 4. System training (`do_system_training`) 127 | - Trains the system 128 | - Stores the trained models and feature normalization factors together on the local disk (pickle files) 129 | 130 | 5. System testing (`do_system_testing`) 131 | - Goes through the testing material and does the classification / detection 132 | - Stores the results (text files) 133 | 134 | 6. System evaluation (`do_system_evaluation`) 135 | - Reads the ground truth and the output of the system and calculates evaluation metrics 136 | 137 | 5. System evaluation 138 | ==================== 139 | 140 | ## Task 1 - Acoustic scene classification 141 | 142 | ### Metrics 143 | 144 | The scoring of acoustic scene classification will be based on classification accuracy: the number of correctly classified segments among the total number of segments. Each segment is considered an independent test sample. 145 | 146 | ### Results 147 | 148 | ##### TUT Acoustic scenes 2016, development set 149 | 150 | [Dataset](https://zenodo.org/record/45739) 151 | 152 | *Evaluation setup* 153 | 154 | - 4 cross-validation folds, average classification accuracy over folds 155 | - 15 acoustic scene classes 156 | - Classification unit: one file (30 seconds of audio). 157 | 158 | *System parameters* 159 | 160 | - Frame size: 40 ms (with 50% hop size) 161 | - Number of Gaussians per acoustic scene class model: 16 162 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values 163 | - Trained and tested on full audio 164 | 165 | | Scene | Accuracy | 166 | |----------------------|--------------| 167 | | Beach | 63.3 % | 168 | | Bus | 79.6 % | 169 | | Cafe/restaurant | 83.2 % | 170 | | Car | 87.2 % | 171 | | City center | 85.5 % | 172 | | Forest path | 81.0 % | 173 | | Grocery store | 65.0 % | 174 | | Home | 82.1 % | 175 | | Library | 50.4 % | 176 | | Metro station | 94.7 % | 177 | | Office | 98.6 % | 178 | | Park | 13.9 % | 179 | | Residential area | 77.7 % | 180 | | Train | 33.6 % | 181 | | Tram | 85.4 % | 182 | | **Overall accuracy** | **72.5 %** | 183 | 184 | ##### DCASE 2013 Scene classification, development set 185 | 186 | [Dataset](http://c4dm.eecs.qmul.ac.uk/rdr/handle/123456789/29) 187 | 188 | *Evaluation setup* 189 | 190 | - 5 fold average 191 | - 10 acoustic scene classes 192 | - Classification unit: one file (30 seconds of audio). 193 | 194 | *System parameters* 195 | 196 | - Frame size: 40 ms (with 50% hop size) 197 | - Number of Gaussians per acoustic scene class model: 16 198 | - Feature vector: 20 MFCC static coefficients (including 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values 199 | 200 | | Scene | Accuracy | 201 | |----------------------|--------------| 202 | | Bus | 93.3 % | 203 | | Busy street | 80.0 % | 204 | | Office | 86.7 % | 205 | | Open air market | 73.3 % | 206 | | Park | 26.7 % | 207 | | Quiet street | 53.3 % | 208 | | Restaurant | 40.0 % | 209 | | Supermarket | 26.7 % | 210 | | Tube | 66.7 % | 211 | | Tube station | 53.3 % | 212 | | **Overall accuracy** | **60.0 %** | 213 | 214 | 215 | ## Task 3 - Real life audio sound event detection 216 | 217 | ### Metrics 218 | 219 | **Segment-based metrics** 220 | 221 | Segment based evaluation is done in a fixed time grid, using segments of one second length to compare the ground truth and the system output. 222 | 223 | - **Total error rate (ER)** is the main metric for this task. Error rate as defined in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) will be evaluated in one-second segments over the entire test set. 224 | 225 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives. 226 | 227 | **Event-based metrics** 228 | 229 | Event-based evaluation considers true positives, false positives and false negatives with respect to event instances. 230 | 231 | **Definition**: An event in the system output is considered correctly detected if its temporal position is overlapping with the temporal position of an event with the same label in the ground truth. A tolerance is allowed for the onset and offset (200 ms for onset and 200 ms or half length for offset) 232 | 233 | - **Error rate** calculated as described in [Poliner2007](https://www.ee.columbia.edu/~dpwe/pubs/PoliE06-piano.pdf) over all test data based on the total number of insertions, deletions and substitutions. 234 | 235 | - **F-score** is calculated over all test data based on the total number of false positive, false negatives and true positives. 236 | 237 | Detailed description of metrics can be found from [DCASE2016 website](http://www.cs.tut.fi/sgn/arg/dcase2016/sound-event-detection-metrics). 238 | 239 | ### Results 240 | 241 | ##### TUT Sound events 2016, development set 242 | 243 | [Dataset](https://zenodo.org/record/45759) 244 | 245 | *Evaluation setup* 246 | 247 | - 4 cross-validation folds 248 | 249 | *System parameters* 250 | 251 | - Frame size: 40 ms (with 50% hop size) 252 | - Number of Gaussians per sound event model (positive and negative): 16 253 | - Feature vector: 20 MFCC static coefficients (excluding 0th) + 20 delta MFCC coefficients + 20 acceleration MFCC coefficients = 60 values 254 | - Decision_threshold: 140 255 | 256 | *Segment based metrics - overall* 257 | 258 | | Scene | ER | ER / S | ER / D | ER / I | F1 | 259 | |-----------------------|-------------|-------------|-------------|-------------|-------------| 260 | | Home | 0.96 | 0.08 | 0.82 | 0.06 | 15.9 % | 261 | | Residential area | 0.86 | 0.05 | 0.74 | 0.07 | 31.5 % | 262 | | **Average** | **0.91** | | | | **23.7 %** | 263 | 264 | *Segment based metrics - class-wise* 265 | 266 | | Scene | ER | F1 | 267 | |-----------------------|-------------|-------------| 268 | | Home | 1.06 | 9.2 % | 269 | | Residential area | 1.03 | 17.6 % | 270 | | **Average** | **1.04** | **13.4 %** | 271 | 272 | *Event based metrics (onset-only) - overall* 273 | 274 | | Scene | ER | F1 | 275 | |-----------------------|-------------|-------------| 276 | | Home | 1.28 | 4.7 % | 277 | | Residential area | 1.92 | 2.9 % | 278 | | **Average** | **1.60** | **3.8 %** | 279 | 280 | *Event based metrics (onset-only) - class-wise* 281 | 282 | | Scene | ER | F1 | 283 | |-----------------------|-------------|-------------| 284 | | Home | 1.27 | 4.3 % | 285 | | Residential area | 1.97 | 1.5 % | 286 | | **Average** | **1.62** | **2.9 %** | 287 | 288 | 289 | 6. System parameters 290 | ==================== 291 | All the parameters are set in `task1_scene_classification.yaml`, and `task3_sound_event_detection_in_real_life_audio.yaml`. 292 | 293 | **Controlling the system flow** 294 | 295 | The blocks of the system can be controlled through the configuration file. Usually all of them can be kept on. 296 | 297 | flow: 298 | initialize: true 299 | extract_features: true 300 | feature_normalizer: true 301 | train_system: true 302 | test_system: true 303 | evaluate_system: true 304 | 305 | **General parameters** 306 | 307 | The selection of used dataset. 308 | 309 | general: 310 | development_dataset: TUTSoundEvents_2016_DevelopmentSet 311 | challenge_dataset: TUTSoundEvents_2016_EvaluationSet 312 | 313 | overwrite: false # Overwrite previously stored data 314 | 315 | `development_dataset: TUTSoundEvents_2016_DevelopmentSet` 316 | : The dataset handler class used while running the system in development mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`). 317 | 318 | `challenge_dataset: TUTSoundEvents_2016_EvaluationSet` 319 | : The dataset handler class used while running the system in challenge mode. If one wants to handle a new dataset, inherit a new class from the Dataset class (`src/dataset.py`). 320 | 321 | Available dataset handler classes: 322 | 323 | **DCASE 2016** 324 | 325 | - TUTAcousticScenes_2016_DevelopmentSet 326 | - TUTAcousticScenes_2016_EvaluationSet 327 | - TUTSoundEvents_2016_DevelopmentSet 328 | - TUTSoundEvents_2016_EvaluationSet 329 | 330 | **DCASE 2013** 331 | 332 | - DCASE2013_Scene_DevelopmentSet 333 | - DCASE2013_Scene_EvaluationSet 334 | - DCASE2013_Event_DevelopmentSet 335 | - DCASE2013_Event_EvaluationSet 336 | 337 | 338 | `overwrite: false` 339 | : Switch to allow the system always to overwrite existing data on disk. 340 | 341 | `challenge_submission_mode: false` 342 | : Switch to control where system output is saved. If true, `path->challenge_results` used, and all results are overwritten by default. 343 | 344 | 345 | **System paths** 346 | 347 | This section contains the storage paths. 348 | 349 | path: 350 | data: data/ 351 | 352 | base: system/baseline_dcase2016_task1/ 353 | features: features/ 354 | feature_normalizers: feature_normalizers/ 355 | models: acoustic_models/ 356 | results: evaluation_results/ 357 | 358 | challenge_results: challenge_submission/task_1_acoustic_scene_classification/ 359 | 360 | These parameters defines the folder-structure to store acoustic features, feature normalization data, acoustic models and evaluation results. 361 | 362 | `data: data/` 363 | : Defines the path where the dataset data is downloaded and stored. Path is relative to the main script. 364 | 365 | `base: system/baseline_dcase2016_task1/` 366 | : Defines the base path where the system stores the data. Other paths are stored under this path. If specified directory does not exist it is created. Path is relative to the main script. 367 | 368 | `challenge_results: challenge_submission/task_1_acoustic_scene_classification/` 369 | : Defines where the system output is stored while running the system in challenge mode. 370 | 371 | **Feature extraction** 372 | 373 | This section contains the feature extraction related parameters. 374 | 375 | features: 376 | fs: 44100 377 | win_length_seconds: 0.04 378 | hop_length_seconds: 0.02 379 | 380 | include_mfcc0: true # 381 | include_delta: true # 382 | include_acceleration: true # 383 | 384 | mfcc: 385 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric] 386 | n_mfcc: 20 # Number of MFCC coefficients 387 | n_mels: 40 # Number of MEL bands used 388 | n_fft: 2048 # FFT length 389 | fmin: 0 # Minimum frequency when constructing MEL bands 390 | fmax: 22050 # Maximum frequency when constructing MEL band 391 | htk: false # Switch for HTK-styled MEL-frequency equation 392 | 393 | mfcc_delta: 394 | width: 9 395 | 396 | mfcc_acceleration: 397 | width: 9 398 | 399 | `fs: 44100` 400 | : Default sampling frequency. If given dataset does not fulfill this criteria the audio data is resampled. 401 | 402 | 403 | `win_length_seconds: 0.04` 404 | : Feature extraction frame length in seconds. 405 | 406 | 407 | `hop_length_seconds: 0.02` 408 | : Feature extraction frame hop-length in seconds. 409 | 410 | 411 | `include_mfcc0: true` 412 | : Switch to include zeroth coefficient of static MFCC in the feature vector 413 | 414 | 415 | `include_delta: true` 416 | : Switch to include delta coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of delta-window is set in `mfcc_delta->width: 9` 417 | 418 | 419 | `include_acceleration: true` 420 | : Switch to include acceleration (delta-delta) coefficients to feature vector. Zeroth MFCC is always included in the delta coefficients. The width of acceleration-window is set in `mfcc_acceleration->width: 9` 421 | 422 | `mfcc->n_mfcc: 16` 423 | : Number of MFCC coefficients 424 | 425 | `mfcc->fmax: 22050` 426 | : Maximum frequency for MEL band. Usually, this is set to a half of the sampling frequency. 427 | 428 | **Classifier** 429 | 430 | This section contains the frame classifier related parameters. These parameters are used when chosen classifier is trained. 431 | 432 | classifier: 433 | method: gmm # The system supports only gmm 434 | 435 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones) 436 | clean_data: false # Exclude audio errors from training audio 437 | 438 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method 439 | 440 | classifier_parameters: 441 | gmm: 442 | n_components: 16 # Number of Gaussian components 443 | covariance_type: diag # Diagonal or full covariance matrix 444 | random_state: 0 445 | thresh: !!null 446 | tol: 0.001 447 | min_covar: 0.001 448 | n_iter: 40 449 | n_init: 1 450 | params: wmc 451 | init_params: wmc 452 | 453 | `audio_error_handling->clean_data: false` 454 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the classifier during training. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones. 455 | 456 | `classifier_parameters->gmm->n_components: 16` 457 | : Number of Gaussians used in the modeling. 458 | 459 | In order to add new classifiers to the system, add parameters under classifier_parameters with new tag. Set `classifier->method` and add appropriate code where `classifier_method` variable is used system block API (look into `do_system_training` and `do_system_testing` methods). In addition to this, one might want to modify filename methods (`get_model_filename` and `get_result_filename`) to allow multiple classifier methods co-exist in the system. 460 | 461 | **Recognizer** 462 | 463 | This section contains the sound recognition related parameters (used in `task1_scene_classification.py`). 464 | 465 | recognizer: 466 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones) 467 | clean_data: false # Exclude audio errors from test audio 468 | 469 | `audio_error_handling->clean_data: false` 470 | : Some datasets provide audio error annotations. With this switch these annotations can be used to exclude the segments containing audio errors from the feature matrix fed to the recognizer. Audio errors can be temporary microphone failure or radio signal interferences from mobile phones. 471 | 472 | **Detector** 473 | 474 | This section contains the sound event detection related parameters (used in `task3_sound_event_detection_in_real_life_audio.py`). 475 | 476 | detector: 477 | decision_threshold: 140.0 478 | smoothing_window_length: 1.0 # seconds 479 | minimum_event_length: 0.1 # seconds 480 | minimum_event_gap: 0.1 # seconds 481 | 482 | `decision_threshold: 140.0` 483 | : Decision threshold used to do final classification. This can be used to control the sensitivity of the system. With log-likelihoods: `event_activity = (positive - negative) > decision_threshold` 484 | 485 | 486 | `smoothing_window_length: 1.0` 487 | : Size of sliding accumulation window (in seconds) used before frame-wise classification decision 488 | 489 | 490 | `minimum_event_length: 0.1` 491 | : Minimum length (in seconds) of outputted events. Events with shorter length than given are filtered out from the system output. 492 | 493 | 494 | `minimum_event_gap: 0.1` 495 | : Minimum gap (in seconds) between events from same event class in the output. Consecutive events (event with same event label) having shorter gaps between them than set parameter are merged together. 496 | 497 | 7. Changelog 498 | ============ 499 | #### 1.2 / 2016-11-10 500 | * Added evaluation in challenge mode for task 1 501 | 502 | #### 1.1 / 2016-05-19 503 | * Added audio error handling 504 | 505 | #### 1.0 / 2016-02-08 506 | * Initial commit 507 | 508 | 8. License 509 | ========== 510 | 511 | See file [EULA.pdf](EULA.pdf) 512 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scipy>=0.15.1 2 | numpy>=1.9.2 3 | scikit-learn==0.16.1 4 | pyyaml>=3.11 5 | librosa==0.4.0 6 | soundfile>=0.9.0 7 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TUT-ARG/DCASE2016-baseline-system-python/8e311066e3b670c52f4fcfe2a7060c18c9969cf8/src/__init__.py -------------------------------------------------------------------------------- /src/evaluation.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import numpy 6 | import math 7 | from sklearn import metrics 8 | 9 | class DCASE2016_SceneClassification_Metrics(): 10 | """DCASE 2016 scene classification metrics 11 | 12 | Examples 13 | -------- 14 | 15 | >>> dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels) 16 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode): 17 | >>> results = [] 18 | >>> result_filename = get_result_filename(fold=fold, path=result_path) 19 | >>> 20 | >>> if os.path.isfile(result_filename): 21 | >>> with open(result_filename, 'rt') as f: 22 | >>> for row in csv.reader(f, delimiter='\t'): 23 | >>> results.append(row) 24 | >>> 25 | >>> y_true = [] 26 | >>> y_pred = [] 27 | >>> for result in results: 28 | >>> y_true.append(dataset.file_meta(result[0])[0]['scene_label']) 29 | >>> y_pred.append(result[1]) 30 | >>> 31 | >>> dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true) 32 | >>> 33 | >>> results = dcase2016_scene_metric.results() 34 | 35 | """ 36 | 37 | def __init__(self, class_list): 38 | """__init__ method. 39 | 40 | Parameters 41 | ---------- 42 | class_list : list 43 | Evaluated scene labels in the list 44 | 45 | """ 46 | self.accuracies_per_class = None 47 | self.correct_per_class = None 48 | self.Nsys = None 49 | self.Nref = None 50 | self.class_list = class_list 51 | self.eps = numpy.spacing(1) 52 | 53 | def __enter__(self): 54 | return self 55 | 56 | def __exit__(self, type, value, traceback): 57 | return self.results() 58 | 59 | def accuracies(self, y_true, y_pred, labels): 60 | """Calculate accuracy 61 | 62 | Parameters 63 | ---------- 64 | y_true : numpy.array 65 | Ground truth array, list of scene labels 66 | 67 | y_pred : numpy.array 68 | System output array, list of scene labels 69 | 70 | labels : list 71 | list of scene labels 72 | 73 | Returns 74 | ------- 75 | array : numpy.array [shape=(number of scene labels,)] 76 | Accuracy per scene label class 77 | 78 | """ 79 | 80 | confusion_matrix = metrics.confusion_matrix(y_true=y_true, y_pred=y_pred, labels=labels).astype(float) 81 | return (numpy.diag(confusion_matrix), numpy.divide(numpy.diag(confusion_matrix), numpy.sum(confusion_matrix, 1)+self.eps)) 82 | 83 | def evaluate(self, annotated_ground_truth, system_output): 84 | """Evaluate system output and annotated ground truth pair. 85 | 86 | Use results method to get results. 87 | 88 | Parameters 89 | ---------- 90 | annotated_ground_truth : numpy.array 91 | Ground truth array, list of scene labels 92 | 93 | system_output : numpy.array 94 | System output array, list of scene labels 95 | 96 | Returns 97 | ------- 98 | nothing 99 | 100 | """ 101 | 102 | correct_per_class, accuracies_per_class = self.accuracies(y_pred=system_output, y_true=annotated_ground_truth, labels=self.class_list) 103 | 104 | if self.accuracies_per_class is None: 105 | self.accuracies_per_class = accuracies_per_class 106 | else: 107 | self.accuracies_per_class = numpy.vstack((self.accuracies_per_class, accuracies_per_class)) 108 | 109 | if self.correct_per_class is None: 110 | self.correct_per_class = correct_per_class 111 | else: 112 | self.correct_per_class = numpy.vstack((self.correct_per_class, correct_per_class)) 113 | 114 | Nref = numpy.zeros(len(self.class_list)) 115 | Nsys = numpy.zeros(len(self.class_list)) 116 | 117 | for class_id, class_label in enumerate(self.class_list): 118 | for item in system_output: 119 | if item == class_label: 120 | Nsys[class_id] += 1 121 | 122 | for item in annotated_ground_truth: 123 | if item == class_label: 124 | Nref[class_id] += 1 125 | 126 | if self.Nref is None: 127 | self.Nref = Nref 128 | else: 129 | self.Nref = numpy.vstack((self.Nref, Nref)) 130 | 131 | if self.Nsys is None: 132 | self.Nsys = Nsys 133 | else: 134 | self.Nsys = numpy.vstack((self.Nsys, Nsys)) 135 | 136 | def results(self): 137 | """Get results 138 | 139 | Outputs results in dict, format: 140 | 141 | { 142 | 'class_wise_data': 143 | { 144 | 'office': { 145 | 'Nsys': 10, 146 | 'Nref': 7, 147 | }, 148 | } 149 | 'class_wise_accuracy': 150 | { 151 | 'office': 0.6, 152 | 'home': 0.4, 153 | } 154 | 'overall_accuracy': numpy.mean(self.accuracies_per_class) 155 | 'Nsys': 100, 156 | 'Nref': 100, 157 | } 158 | 159 | Parameters 160 | ---------- 161 | nothing 162 | 163 | Returns 164 | ------- 165 | results : dict 166 | Results dict 167 | 168 | """ 169 | 170 | results = { 171 | 'class_wise_data': {}, 172 | 'class_wise_accuracy': {}, 173 | 'overall_accuracy': float(numpy.mean(self.accuracies_per_class)), 174 | 'class_wise_correct_count': self.correct_per_class.tolist(), 175 | 176 | } 177 | if len(self.Nsys.shape) == 2: 178 | results['Nsys'] = int(sum(sum(self.Nsys))) 179 | results['Nref'] = int(sum(sum(self.Nref))) 180 | else: 181 | results['Nsys'] = int(sum(self.Nsys)) 182 | results['Nref'] = int(sum(self.Nref)) 183 | 184 | for class_id, class_label in enumerate(self.class_list): 185 | if len(self.accuracies_per_class.shape) == 2: 186 | results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[:, class_id]) 187 | results['class_wise_data'][class_label] = { 188 | 'Nsys': int(sum(self.Nsys[:, class_id])), 189 | 'Nref': int(sum(self.Nref[:, class_id])), 190 | } 191 | else: 192 | results['class_wise_accuracy'][class_label] = numpy.mean(self.accuracies_per_class[class_id]) 193 | results['class_wise_data'][class_label] = { 194 | 'Nsys': int(self.Nsys[class_id]), 195 | 'Nref': int(self.Nref[class_id]), 196 | } 197 | 198 | return results 199 | 200 | 201 | class EventDetectionMetrics(object): 202 | """Baseclass for sound event metric classes. 203 | """ 204 | 205 | def __init__(self, class_list): 206 | """__init__ method. 207 | 208 | Parameters 209 | ---------- 210 | class_list : list 211 | List of class labels to be evaluated. 212 | 213 | """ 214 | 215 | self.class_list = class_list 216 | self.eps = numpy.spacing(1) 217 | 218 | def max_event_offset(self, data): 219 | """Get maximum event offset from event list 220 | 221 | Parameters 222 | ---------- 223 | data : list 224 | Event list, list of event dicts 225 | 226 | Returns 227 | ------- 228 | max : float > 0 229 | Maximum event offset 230 | """ 231 | 232 | max = 0 233 | for event in data: 234 | if event['event_offset'] > max: 235 | max = event['event_offset'] 236 | return max 237 | 238 | def list_to_roll(self, data, time_resolution=0.01): 239 | """Convert event list into event roll. 240 | Event roll is binary matrix indicating event activity withing time segment defined by time_resolution. 241 | 242 | Parameters 243 | ---------- 244 | data : list 245 | Event list, list of event dicts 246 | 247 | time_resolution : float > 0 248 | Time resolution used when converting event into event roll. 249 | 250 | Returns 251 | ------- 252 | event_roll : numpy.ndarray [shape=(math.ceil(data_length * 1 / time_resolution), amount of classes)] 253 | Event roll 254 | """ 255 | 256 | # Initialize 257 | data_length = self.max_event_offset(data) 258 | event_roll = numpy.zeros(( int(math.ceil(data_length * 1 / time_resolution)), len(self.class_list))) 259 | 260 | # Fill-in event_roll 261 | for event in data: 262 | pos = self.class_list.index(event['event_label'].rstrip()) 263 | 264 | onset = int(math.floor(event['event_onset'] * 1 / time_resolution)) 265 | offset = int(math.ceil(event['event_offset'] * 1 / time_resolution)) 266 | 267 | event_roll[onset:offset, pos] = 1 268 | 269 | return event_roll 270 | 271 | 272 | class DCASE2016_EventDetection_SegmentBasedMetrics(EventDetectionMetrics): 273 | """DCASE2016 Segment based metrics for sound event detection 274 | 275 | Supported metrics: 276 | - Overall 277 | - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D) 278 | - F-score (F1) 279 | - Class-wise 280 | - Error rate (ER), Insertions (I), Deletions (D) 281 | - F-score (F1) 282 | 283 | Examples 284 | -------- 285 | 286 | >>> overall_metrics_per_scene = {} 287 | >>> for scene_id, scene_label in enumerate(dataset.scene_labels): 288 | >>> dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label)) 289 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode): 290 | >>> results = [] 291 | >>> result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path) 292 | >>> 293 | >>> if os.path.isfile(result_filename): 294 | >>> with open(result_filename, 'rt') as f: 295 | >>> for row in csv.reader(f, delimiter='\t'): 296 | >>> results.append(row) 297 | >>> 298 | >>> for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)): 299 | >>> current_file_results = [] 300 | >>> for result_line in results: 301 | >>> if result_line[0] == dataset.absolute_to_relative(item['file']): 302 | >>> current_file_results.append( 303 | >>> {'file': result_line[0], 304 | >>> 'event_onset': float(result_line[1]), 305 | >>> 'event_offset': float(result_line[2]), 306 | >>> 'event_label': result_line[3] 307 | >>> } 308 | >>> ) 309 | >>> meta = dataset.file_meta(dataset.absolute_to_relative(item['file'])) 310 | >>> dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta) 311 | >>> overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results() 312 | 313 | """ 314 | 315 | def __init__(self, class_list, time_resolution=1.0): 316 | """__init__ method. 317 | 318 | Parameters 319 | ---------- 320 | class_list : list 321 | List of class labels to be evaluated. 322 | 323 | time_resolution : float > 0 324 | Time resolution used when converting event into event roll. 325 | (Default value = 1.0) 326 | 327 | """ 328 | 329 | self.time_resolution = time_resolution 330 | 331 | self.overall = { 332 | 'Ntp': 0.0, 333 | 'Ntn': 0.0, 334 | 'Nfp': 0.0, 335 | 'Nfn': 0.0, 336 | 'Nref': 0.0, 337 | 'Nsys': 0.0, 338 | 'ER': 0.0, 339 | 'S': 0.0, 340 | 'D': 0.0, 341 | 'I': 0.0, 342 | } 343 | self.class_wise = {} 344 | 345 | for class_label in class_list: 346 | self.class_wise[class_label] = { 347 | 'Ntp': 0.0, 348 | 'Ntn': 0.0, 349 | 'Nfp': 0.0, 350 | 'Nfn': 0.0, 351 | 'Nref': 0.0, 352 | 'Nsys': 0.0, 353 | } 354 | 355 | EventDetectionMetrics.__init__(self, class_list=class_list) 356 | 357 | def __enter__(self): 358 | # Initialize class and return it 359 | return self 360 | 361 | def __exit__(self, type, value, traceback): 362 | # Finalize evaluation and return results 363 | return self.results() 364 | 365 | def evaluate(self, annotated_ground_truth, system_output): 366 | """Evaluate system output and annotated ground truth pair. 367 | 368 | Use results method to get results. 369 | 370 | Parameters 371 | ---------- 372 | annotated_ground_truth : numpy.array 373 | Ground truth array, list of scene labels 374 | 375 | system_output : numpy.array 376 | System output array, list of scene labels 377 | 378 | Returns 379 | ------- 380 | nothing 381 | 382 | """ 383 | 384 | # Convert event list into frame-based representation 385 | system_event_roll = self.list_to_roll(data=system_output, time_resolution=self.time_resolution) 386 | annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=self.time_resolution) 387 | 388 | # Fix durations of both event_rolls to be equal 389 | if annotated_event_roll.shape[0] > system_event_roll.shape[0]: 390 | padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list))) 391 | system_event_roll = numpy.vstack((system_event_roll, padding)) 392 | 393 | if system_event_roll.shape[0] > annotated_event_roll.shape[0]: 394 | padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list))) 395 | annotated_event_roll = numpy.vstack((annotated_event_roll, padding)) 396 | 397 | # Compute segment-based overall metrics 398 | for segment_id in range(0, annotated_event_roll.shape[0]): 399 | annotated_segment = annotated_event_roll[segment_id, :] 400 | system_segment = system_event_roll[segment_id, :] 401 | 402 | Ntp = sum(system_segment + annotated_segment > 1) 403 | Ntn = sum(system_segment + annotated_segment == 0) 404 | Nfp = sum(system_segment - annotated_segment > 0) 405 | Nfn = sum(annotated_segment - system_segment > 0) 406 | 407 | Nref = sum(annotated_segment) 408 | Nsys = sum(system_segment) 409 | 410 | S = min(Nref, Nsys) - Ntp 411 | D = max(0, Nref - Nsys) 412 | I = max(0, Nsys - Nref) 413 | ER = max(Nref, Nsys) - Ntp 414 | 415 | self.overall['Ntp'] += Ntp 416 | self.overall['Ntn'] += Ntn 417 | self.overall['Nfp'] += Nfp 418 | self.overall['Nfn'] += Nfn 419 | self.overall['Nref'] += Nref 420 | self.overall['Nsys'] += Nsys 421 | self.overall['S'] += S 422 | self.overall['D'] += D 423 | self.overall['I'] += I 424 | self.overall['ER'] += ER 425 | 426 | for class_id, class_label in enumerate(self.class_list): 427 | annotated_segment = annotated_event_roll[:, class_id] 428 | system_segment = system_event_roll[:, class_id] 429 | 430 | Ntp = sum(system_segment + annotated_segment > 1) 431 | Ntn = sum(system_segment + annotated_segment == 0) 432 | Nfp = sum(system_segment - annotated_segment > 0) 433 | Nfn = sum(annotated_segment - system_segment > 0) 434 | 435 | Nref = sum(annotated_segment) 436 | Nsys = sum(system_segment) 437 | 438 | self.class_wise[class_label]['Ntp'] += Ntp 439 | self.class_wise[class_label]['Ntn'] += Ntn 440 | self.class_wise[class_label]['Nfp'] += Nfp 441 | self.class_wise[class_label]['Nfn'] += Nfn 442 | self.class_wise[class_label]['Nref'] += Nref 443 | self.class_wise[class_label]['Nsys'] += Nsys 444 | 445 | return self 446 | 447 | def results(self): 448 | """Get results 449 | 450 | Outputs results in dict, format: 451 | 452 | { 453 | 'overall': 454 | { 455 | 'Pre': 456 | 'Rec': 457 | 'F': 458 | 'ER': 459 | 'S': 460 | 'D': 461 | 'I': 462 | } 463 | 'class_wise': 464 | { 465 | 'office': { 466 | 'Pre': 467 | 'Rec': 468 | 'F': 469 | 'ER': 470 | 'D': 471 | 'I': 472 | 'Nref': 473 | 'Nsys': 474 | 'Ntp': 475 | 'Nfn': 476 | 'Nfp': 477 | }, 478 | } 479 | 'class_wise_average': 480 | { 481 | 'F': 482 | 'ER': 483 | } 484 | } 485 | 486 | Parameters 487 | ---------- 488 | nothing 489 | 490 | Returns 491 | ------- 492 | results : dict 493 | Results dict 494 | 495 | """ 496 | 497 | results = {'overall': {}, 498 | 'class_wise': {}, 499 | 'class_wise_average': {}, 500 | } 501 | 502 | # Overall metrics 503 | results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps) 504 | results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref'] 505 | results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps)) 506 | 507 | results['overall']['ER'] = self.overall['ER'] / self.overall['Nref'] 508 | results['overall']['S'] = self.overall['S'] / self.overall['Nref'] 509 | results['overall']['D'] = self.overall['D'] / self.overall['Nref'] 510 | results['overall']['I'] = self.overall['I'] / self.overall['Nref'] 511 | 512 | # Class-wise metrics 513 | class_wise_F = [] 514 | class_wise_ER = [] 515 | for class_id, class_label in enumerate(self.class_list): 516 | if class_label not in results['class_wise']: 517 | results['class_wise'][class_label] = {} 518 | results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps) 519 | results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps) 520 | results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps)) 521 | 522 | results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn'] + self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps) 523 | results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps) 524 | results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps) 525 | 526 | results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref'] 527 | results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys'] 528 | results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp'] 529 | results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn'] 530 | results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp'] 531 | 532 | class_wise_F.append(results['class_wise'][class_label]['F']) 533 | class_wise_ER.append(results['class_wise'][class_label]['ER']) 534 | 535 | results['class_wise_average']['F'] = numpy.mean(class_wise_F) 536 | results['class_wise_average']['ER'] = numpy.mean(class_wise_ER) 537 | 538 | return results 539 | 540 | 541 | class DCASE2016_EventDetection_EventBasedMetrics(EventDetectionMetrics): 542 | """DCASE2016 Event based metrics for sound event detection 543 | 544 | Supported metrics: 545 | - Overall 546 | - Error rate (ER), Substitutions (S), Insertions (I), Deletions (D) 547 | - F-score (F1) 548 | - Class-wise 549 | - Error rate (ER), Insertions (I), Deletions (D) 550 | - F-score (F1) 551 | 552 | Examples 553 | -------- 554 | 555 | >>> overall_metrics_per_scene = {} 556 | >>> for scene_id, scene_label in enumerate(dataset.scene_labels): 557 | >>> dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label)) 558 | >>> for fold in dataset.folds(mode=dataset_evaluation_mode): 559 | >>> results = [] 560 | >>> result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path) 561 | >>> 562 | >>> if os.path.isfile(result_filename): 563 | >>> with open(result_filename, 'rt') as f: 564 | >>> for row in csv.reader(f, delimiter='\t'): 565 | >>> results.append(row) 566 | >>> 567 | >>> for file_id, item in enumerate(dataset.test(fold,scene_label=scene_label)): 568 | >>> current_file_results = [] 569 | >>> for result_line in results: 570 | >>> if result_line[0] == dataset.absolute_to_relative(item['file']): 571 | >>> current_file_results.append( 572 | >>> {'file': result_line[0], 573 | >>> 'event_onset': float(result_line[1]), 574 | >>> 'event_offset': float(result_line[2]), 575 | >>> 'event_label': result_line[3] 576 | >>> } 577 | >>> ) 578 | >>> meta = dataset.file_meta(dataset.absolute_to_relative(item['file'])) 579 | >>> dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta) 580 | >>> overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results() 581 | 582 | """ 583 | 584 | def __init__(self, class_list, t_collar=0.2, use_onset_condition=True, use_offset_condition=True): 585 | """__init__ method. 586 | 587 | Parameters 588 | ---------- 589 | class_list : list 590 | List of class labels to be evaluated. 591 | 592 | t_collar : float > 0 593 | Time collar for event onset and offset condition 594 | (Default value = 0.2) 595 | 596 | use_onset_condition : bool 597 | Use onset condition when finding correctly detected events 598 | (Default value = True) 599 | 600 | use_offset_condition : bool 601 | Use offset condition when finding correctly detected events 602 | (Default value = True) 603 | 604 | """ 605 | 606 | self.t_collar = t_collar 607 | self.use_onset_condition = use_onset_condition 608 | self.use_offset_condition = use_offset_condition 609 | 610 | self.overall = { 611 | 'Nref': 0.0, 612 | 'Nsys': 0.0, 613 | 'Nsubs': 0.0, 614 | 'Ntp': 0.0, 615 | 'Nfp': 0.0, 616 | 'Nfn': 0.0, 617 | } 618 | self.class_wise = {} 619 | 620 | for class_label in class_list: 621 | self.class_wise[class_label] = { 622 | 'Nref': 0.0, 623 | 'Nsys': 0.0, 624 | 'Ntp': 0.0, 625 | 'Ntn': 0.0, 626 | 'Nfp': 0.0, 627 | 'Nfn': 0.0, 628 | } 629 | 630 | EventDetectionMetrics.__init__(self, class_list=class_list) 631 | 632 | def __enter__(self): 633 | # Initialize class and return it 634 | return self 635 | 636 | def __exit__(self, type, value, traceback): 637 | # Finalize evaluation and return results 638 | return self.results() 639 | 640 | def evaluate(self, annotated_ground_truth, system_output): 641 | """Evaluate system output and annotated ground truth pair. 642 | 643 | Use results method to get results. 644 | 645 | Parameters 646 | ---------- 647 | annotated_ground_truth : numpy.array 648 | Ground truth array, list of scene labels 649 | 650 | system_output : numpy.array 651 | System output array, list of scene labels 652 | 653 | Returns 654 | ------- 655 | nothing 656 | 657 | """ 658 | 659 | # Overall metrics 660 | 661 | # Total number of detected and reference events 662 | Nsys = len(system_output) 663 | Nref = len(annotated_ground_truth) 664 | 665 | sys_correct = numpy.zeros(Nsys, dtype=bool) 666 | ref_correct = numpy.zeros(Nref, dtype=bool) 667 | 668 | # Number of correctly transcribed events, onset/offset within a t_collar range 669 | for j in range(0, len(annotated_ground_truth)): 670 | for i in range(0, len(system_output)): 671 | if not sys_correct[i]: # skip already matched events 672 | label_condition = annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] 673 | if self.use_onset_condition: 674 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j], 675 | system_event=system_output[i], 676 | t_collar=self.t_collar) 677 | else: 678 | onset_condition = True 679 | 680 | if self.use_offset_condition: 681 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j], 682 | system_event=system_output[i], 683 | t_collar=self.t_collar) 684 | else: 685 | offset_condition = True 686 | 687 | if label_condition and onset_condition and offset_condition: 688 | ref_correct[j] = True 689 | sys_correct[i] = True 690 | break 691 | 692 | Ntp = numpy.sum(sys_correct) 693 | 694 | sys_leftover = numpy.nonzero(numpy.negative(sys_correct))[0] 695 | ref_leftover = numpy.nonzero(numpy.negative(ref_correct))[0] 696 | 697 | # Substitutions 698 | Nsubs = 0 699 | sys_counted = numpy.zeros(Nsys, dtype=bool) 700 | for j in ref_leftover: 701 | for i in sys_leftover: 702 | if not sys_counted[i]: 703 | if self.use_onset_condition: 704 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j], 705 | system_event=system_output[i], 706 | t_collar=self.t_collar) 707 | else: 708 | onset_condition = True 709 | 710 | if self.use_offset_condition: 711 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j], 712 | system_event=system_output[i], 713 | t_collar=self.t_collar) 714 | else: 715 | offset_condition = True 716 | 717 | if onset_condition and offset_condition: 718 | sys_counted[i] = True 719 | Nsubs += 1 720 | break 721 | 722 | Nfp = Nsys - Ntp - Nsubs 723 | Nfn = Nref - Ntp - Nsubs 724 | 725 | self.overall['Nref'] += Nref 726 | self.overall['Nsys'] += Nsys 727 | self.overall['Ntp'] += Ntp 728 | self.overall['Nsubs'] += Nsubs 729 | self.overall['Nfp'] += Nfp 730 | self.overall['Nfn'] += Nfn 731 | 732 | # Class-wise metrics 733 | for class_id, class_label in enumerate(self.class_list): 734 | Nref = 0.0 735 | Nsys = 0.0 736 | Ntp = 0.0 737 | 738 | # Count event frequencies in the ground truth 739 | for i in range(0, len(annotated_ground_truth)): 740 | if annotated_ground_truth[i]['event_label'] == class_label: 741 | Nref += 1 742 | 743 | # Count event frequencies in the system output 744 | for i in range(0, len(system_output)): 745 | if system_output[i]['event_label'] == class_label: 746 | Nsys += 1 747 | 748 | sys_counted = numpy.zeros(len(system_output), dtype=bool) 749 | for j in range(0, len(annotated_ground_truth)): 750 | if annotated_ground_truth[j]['event_label'] == class_label: 751 | for i in range(0, len(system_output)): 752 | if system_output[i]['event_label'] == class_label and not sys_counted[i]: 753 | if self.use_onset_condition: 754 | onset_condition = self.onset_condition(annotated_event=annotated_ground_truth[j], 755 | system_event=system_output[i], 756 | t_collar=self.t_collar) 757 | else: 758 | onset_condition = True 759 | 760 | if self.use_offset_condition: 761 | offset_condition = self.offset_condition(annotated_event=annotated_ground_truth[j], 762 | system_event=system_output[i], 763 | t_collar=self.t_collar) 764 | else: 765 | offset_condition = True 766 | 767 | if onset_condition and offset_condition: 768 | sys_counted[i] = True 769 | Ntp += 1 770 | break 771 | 772 | Nfp = Nsys - Ntp 773 | Nfn = Nref - Ntp 774 | 775 | self.class_wise[class_label]['Nref'] += Nref 776 | self.class_wise[class_label]['Nsys'] += Nsys 777 | 778 | self.class_wise[class_label]['Ntp'] += Ntp 779 | self.class_wise[class_label]['Nfp'] += Nfp 780 | self.class_wise[class_label]['Nfn'] += Nfn 781 | 782 | 783 | def onset_condition(self, annotated_event, system_event, t_collar=0.200): 784 | """Onset condition, checked does the event pair fulfill condition 785 | 786 | Condition: 787 | 788 | - event onsets are within t_collar each other 789 | 790 | Parameters 791 | ---------- 792 | annotated_event : dict 793 | Event dict 794 | 795 | system_event : dict 796 | Event dict 797 | 798 | t_collar : float > 0 799 | Defines how close event onsets have to be in order to be considered match. In seconds. 800 | (Default value = 0.2) 801 | 802 | Returns 803 | ------- 804 | result : bool 805 | Condition result 806 | 807 | """ 808 | 809 | return math.fabs(annotated_event['event_onset'] - system_event['event_onset']) <= t_collar 810 | 811 | def offset_condition(self, annotated_event, system_event, t_collar=0.200, percentage_of_length=0.5): 812 | """Offset condition, checking does the event pair fulfill condition 813 | 814 | Condition: 815 | 816 | - event offsets are within t_collar each other 817 | or 818 | - system event offset is within the percentage_of_length*annotated event_length 819 | 820 | Parameters 821 | ---------- 822 | annotated_event : dict 823 | Event dict 824 | 825 | system_event : dict 826 | Event dict 827 | 828 | t_collar : float > 0 829 | Defines how close event onsets have to be in order to be considered match. In seconds. 830 | (Default value = 0.2) 831 | 832 | percentage_of_length : float [0-1] 833 | 834 | 835 | Returns 836 | ------- 837 | result : bool 838 | Condition result 839 | 840 | """ 841 | annotated_length = annotated_event['event_offset'] - annotated_event['event_onset'] 842 | return math.fabs(annotated_event['event_offset'] - system_event['event_offset']) <= max(t_collar, percentage_of_length * annotated_length) 843 | 844 | def results(self): 845 | """Get results 846 | 847 | Outputs results in dict, format: 848 | 849 | { 850 | 'overall': 851 | { 852 | 'Pre': 853 | 'Rec': 854 | 'F': 855 | 'ER': 856 | 'S': 857 | 'D': 858 | 'I': 859 | } 860 | 'class_wise': 861 | { 862 | 'office': { 863 | 'Pre': 864 | 'Rec': 865 | 'F': 866 | 'ER': 867 | 'D': 868 | 'I': 869 | 'Nref': 870 | 'Nsys': 871 | 'Ntp': 872 | 'Nfn': 873 | 'Nfp': 874 | }, 875 | } 876 | 'class_wise_average': 877 | { 878 | 'F': 879 | 'ER': 880 | } 881 | } 882 | 883 | Parameters 884 | ---------- 885 | nothing 886 | 887 | Returns 888 | ------- 889 | results : dict 890 | Results dict 891 | 892 | """ 893 | 894 | results = { 895 | 'overall': {}, 896 | 'class_wise': {}, 897 | 'class_wise_average': {}, 898 | } 899 | 900 | # Overall metrics 901 | results['overall']['Pre'] = self.overall['Ntp'] / (self.overall['Nsys'] + self.eps) 902 | results['overall']['Rec'] = self.overall['Ntp'] / self.overall['Nref'] 903 | results['overall']['F'] = 2 * ((results['overall']['Pre'] * results['overall']['Rec']) / (results['overall']['Pre'] + results['overall']['Rec'] + self.eps)) 904 | 905 | results['overall']['ER'] = (self.overall['Nfn'] + self.overall['Nfp'] + self.overall['Nsubs']) / self.overall['Nref'] 906 | results['overall']['S'] = self.overall['Nsubs'] / self.overall['Nref'] 907 | results['overall']['D'] = self.overall['Nfn'] / self.overall['Nref'] 908 | results['overall']['I'] = self.overall['Nfp'] / self.overall['Nref'] 909 | 910 | # Class-wise metrics 911 | class_wise_F = [] 912 | class_wise_ER = [] 913 | 914 | for class_label in self.class_list: 915 | if class_label not in results['class_wise']: 916 | results['class_wise'][class_label] = {} 917 | 918 | results['class_wise'][class_label]['Pre'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nsys'] + self.eps) 919 | results['class_wise'][class_label]['Rec'] = self.class_wise[class_label]['Ntp'] / (self.class_wise[class_label]['Nref'] + self.eps) 920 | results['class_wise'][class_label]['F'] = 2 * ((results['class_wise'][class_label]['Pre'] * results['class_wise'][class_label]['Rec']) / (results['class_wise'][class_label]['Pre'] + results['class_wise'][class_label]['Rec'] + self.eps)) 921 | 922 | results['class_wise'][class_label]['ER'] = (self.class_wise[class_label]['Nfn']+self.class_wise[class_label]['Nfp']) / (self.class_wise[class_label]['Nref'] + self.eps) 923 | results['class_wise'][class_label]['D'] = self.class_wise[class_label]['Nfn'] / (self.class_wise[class_label]['Nref'] + self.eps) 924 | results['class_wise'][class_label]['I'] = self.class_wise[class_label]['Nfp'] / (self.class_wise[class_label]['Nref'] + self.eps) 925 | 926 | results['class_wise'][class_label]['Nref'] = self.class_wise[class_label]['Nref'] 927 | results['class_wise'][class_label]['Nsys'] = self.class_wise[class_label]['Nsys'] 928 | results['class_wise'][class_label]['Ntp'] = self.class_wise[class_label]['Ntp'] 929 | results['class_wise'][class_label]['Nfn'] = self.class_wise[class_label]['Nfn'] 930 | results['class_wise'][class_label]['Nfp'] = self.class_wise[class_label]['Nfp'] 931 | 932 | class_wise_F.append(results['class_wise'][class_label]['F']) 933 | class_wise_ER.append(results['class_wise'][class_label]['ER']) 934 | 935 | # Class-wise average 936 | results['class_wise_average']['F'] = numpy.mean(class_wise_F) 937 | results['class_wise_average']['ER'] = numpy.mean(class_wise_ER) 938 | 939 | return results 940 | 941 | 942 | class DCASE2013_EventDetection_Metrics(EventDetectionMetrics): 943 | """Lecagy DCASE2013 metrics, converted from the provided Matlab implementation 944 | 945 | Supported metrics: 946 | - Frame based 947 | - F-score (F) 948 | - AEER 949 | - Event based 950 | - Onset 951 | - F-Score (F) 952 | - AEER 953 | - Onset-offset 954 | - F-Score (F) 955 | - AEER 956 | - Class based 957 | - Onset 958 | - F-Score (F) 959 | - AEER 960 | - Onset-offset 961 | - F-Score (F) 962 | - AEER 963 | """ 964 | 965 | # 966 | 967 | def frame_based(self, annotated_ground_truth, system_output, resolution=0.01): 968 | # Convert event list into frame-based representation 969 | system_event_roll = self.list_to_roll(data=system_output, time_resolution=resolution) 970 | annotated_event_roll = self.list_to_roll(data=annotated_ground_truth, time_resolution=resolution) 971 | 972 | # Fix durations of both event_rolls to be equal 973 | if annotated_event_roll.shape[0] > system_event_roll.shape[0]: 974 | padding = numpy.zeros((annotated_event_roll.shape[0] - system_event_roll.shape[0], len(self.class_list))) 975 | system_event_roll = numpy.vstack((system_event_roll, padding)) 976 | 977 | if system_event_roll.shape[0] > annotated_event_roll.shape[0]: 978 | padding = numpy.zeros((system_event_roll.shape[0] - annotated_event_roll.shape[0], len(self.class_list))) 979 | annotated_event_roll = numpy.vstack((annotated_event_roll, padding)) 980 | 981 | # Compute frame-based metrics 982 | Nref = sum(sum(annotated_event_roll)) 983 | Ntot = sum(sum(system_event_roll)) 984 | Ntp = sum(sum(system_event_roll + annotated_event_roll > 1)) 985 | Nfp = sum(sum(system_event_roll - annotated_event_roll > 0)) 986 | Nfn = sum(sum(annotated_event_roll - system_event_roll > 0)) 987 | Nsubs = min(Nfp, Nfn) 988 | 989 | eps = numpy.spacing(1) 990 | 991 | results = dict() 992 | results['Rec'] = Ntp / (Nref + eps) 993 | results['Pre'] = Ntp / (Ntot + eps) 994 | results['F'] = 2 * ((results['Pre'] * results['Rec']) / (results['Pre'] + results['Rec'] + eps)) 995 | results['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps) 996 | 997 | return results 998 | 999 | def event_based(self, annotated_ground_truth, system_output): 1000 | # Event-based evaluation for event detection task 1001 | # outputFile: the output of the event detection system 1002 | # GTFile: the ground truth list of events 1003 | 1004 | # Total number of detected and reference events 1005 | Ntot = len(system_output) 1006 | Nref = len(annotated_ground_truth) 1007 | 1008 | # Number of correctly transcribed events, onset within a +/-100 ms range 1009 | Ncorr = 0 1010 | NcorrOff = 0 1011 | for j in range(0, len(annotated_ground_truth)): 1012 | for i in range(0, len(system_output)): 1013 | if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and (math.fabs(annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1): 1014 | Ncorr += 1 1015 | 1016 | # If offset within a +/-100 ms range or within 50% of ground-truth event's duration 1017 | if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max(0.1, 0.5 * (annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j]['event_onset'])): 1018 | NcorrOff += 1 1019 | 1020 | break # In order to not evaluate duplicates 1021 | 1022 | # Compute onset-only event-based metrics 1023 | eps = numpy.spacing(1) 1024 | results = { 1025 | 'onset': {}, 1026 | 'onset-offset': {}, 1027 | } 1028 | 1029 | Nfp = Ntot - Ncorr 1030 | Nfn = Nref - Ncorr 1031 | Nsubs = min(Nfp, Nfn) 1032 | results['onset']['Rec'] = Ncorr / (Nref + eps) 1033 | results['onset']['Pre'] = Ncorr / (Ntot + eps) 1034 | results['onset']['F'] = 2 * ( 1035 | (results['onset']['Pre'] * results['onset']['Rec']) / ( 1036 | results['onset']['Pre'] + results['onset']['Rec'] + eps)) 1037 | results['onset']['AEER'] = (Nfn + Nfp + Nsubs) / (Nref + eps) 1038 | 1039 | # Compute onset-offset event-based metrics 1040 | NfpOff = Ntot - NcorrOff 1041 | NfnOff = Nref - NcorrOff 1042 | NsubsOff = min(NfpOff, NfnOff) 1043 | results['onset-offset']['Rec'] = NcorrOff / (Nref + eps) 1044 | results['onset-offset']['Pre'] = NcorrOff / (Ntot + eps) 1045 | results['onset-offset']['F'] = 2 * ((results['onset-offset']['Pre'] * results['onset-offset']['Rec']) / ( 1046 | results['onset-offset']['Pre'] + results['onset-offset']['Rec'] + eps)) 1047 | results['onset-offset']['AEER'] = (NfnOff + NfpOff + NsubsOff) / (Nref + eps) 1048 | 1049 | return results 1050 | 1051 | def class_based(self, annotated_ground_truth, system_output): 1052 | # Class-wise event-based evaluation for event detection task 1053 | # outputFile: the output of the event detection system 1054 | # GTFile: the ground truth list of events 1055 | 1056 | # Total number of detected and reference events per class 1057 | Ntot = numpy.zeros((len(self.class_list), 1)) 1058 | for event in system_output: 1059 | pos = self.class_list.index(event['event_label']) 1060 | Ntot[pos] += 1 1061 | 1062 | Nref = numpy.zeros((len(self.class_list), 1)) 1063 | for event in annotated_ground_truth: 1064 | pos = self.class_list.index(event['event_label']) 1065 | Nref[pos] += 1 1066 | 1067 | I = (Nref > 0).nonzero()[0] # index for classes present in ground-truth 1068 | 1069 | # Number of correctly transcribed events per class, onset within a +/-100 ms range 1070 | Ncorr = numpy.zeros((len(self.class_list), 1)) 1071 | NcorrOff = numpy.zeros((len(self.class_list), 1)) 1072 | 1073 | for j in range(0, len(annotated_ground_truth)): 1074 | for i in range(0, len(system_output)): 1075 | if annotated_ground_truth[j]['event_label'] == system_output[i]['event_label'] and ( 1076 | math.fabs( 1077 | annotated_ground_truth[j]['event_onset'] - system_output[i]['event_onset']) <= 0.1): 1078 | pos = self.class_list.index(system_output[i]['event_label']) 1079 | Ncorr[pos] += 1 1080 | 1081 | # If offset within a +/-100 ms range or within 50% of ground-truth event's duration 1082 | if math.fabs(annotated_ground_truth[j]['event_offset'] - system_output[i]['event_offset']) <= max( 1083 | 0.1, 0.5 * ( 1084 | annotated_ground_truth[j]['event_offset'] - annotated_ground_truth[j][ 1085 | 'event_onset'])): 1086 | pos = self.class_list.index(system_output[i]['event_label']) 1087 | NcorrOff[pos] += 1 1088 | 1089 | break # In order to not evaluate duplicates 1090 | 1091 | # Compute onset-only class-wise event-based metrics 1092 | eps = numpy.spacing(1) 1093 | results = { 1094 | 'onset': {}, 1095 | 'onset-offset': {}, 1096 | } 1097 | 1098 | Nfp = Ntot - Ncorr 1099 | Nfn = Nref - Ncorr 1100 | Nsubs = numpy.minimum(Nfp, Nfn) 1101 | tempRec = Ncorr[I] / (Nref[I] + eps) 1102 | tempPre = Ncorr[I] / (Ntot[I] + eps) 1103 | results['onset']['Rec'] = numpy.mean(tempRec) 1104 | results['onset']['Pre'] = numpy.mean(tempPre) 1105 | tempF = 2 * ((tempPre * tempRec) / (tempPre + tempRec + eps)) 1106 | results['onset']['F'] = numpy.mean(tempF) 1107 | tempAEER = (Nfn[I] + Nfp[I] + Nsubs[I]) / (Nref[I] + eps) 1108 | results['onset']['AEER'] = numpy.mean(tempAEER) 1109 | 1110 | # Compute onset-offset class-wise event-based metrics 1111 | NfpOff = Ntot - NcorrOff 1112 | NfnOff = Nref - NcorrOff 1113 | NsubsOff = numpy.minimum(NfpOff, NfnOff) 1114 | tempRecOff = NcorrOff[I] / (Nref[I] + eps) 1115 | tempPreOff = NcorrOff[I] / (Ntot[I] + eps) 1116 | results['onset-offset']['Rec'] = numpy.mean(tempRecOff) 1117 | results['onset-offset']['Pre'] = numpy.mean(tempPreOff) 1118 | tempFOff = 2 * ((tempPreOff * tempRecOff) / (tempPreOff + tempRecOff + eps)) 1119 | results['onset-offset']['F'] = numpy.mean(tempFOff) 1120 | tempAEEROff = (NfnOff[I] + NfpOff[I] + NsubsOff[I]) / (Nref[I] + eps) 1121 | results['onset-offset']['AEER'] = numpy.mean(tempAEEROff) 1122 | 1123 | return results 1124 | 1125 | 1126 | def main(argv): 1127 | # Examples to show usage and required data structures 1128 | class_list = ['class1', 'class2', 'class3'] 1129 | system_output = [ 1130 | { 1131 | 'event_label': 'class1', 1132 | 'event_onset': 0.1, 1133 | 'event_offset': 1.0 1134 | }, 1135 | { 1136 | 'event_label': 'class2', 1137 | 'event_onset': 4.1, 1138 | 'event_offset': 4.7 1139 | }, 1140 | { 1141 | 'event_label': 'class3', 1142 | 'event_onset': 5.5, 1143 | 'event_offset': 6.7 1144 | } 1145 | ] 1146 | annotated_groundtruth = [ 1147 | { 1148 | 'event_label': 'class1', 1149 | 'event_onset': 0.1, 1150 | 'event_offset': 1.0 1151 | }, 1152 | { 1153 | 'event_label': 'class2', 1154 | 'event_onset': 4.2, 1155 | 'event_offset': 5.4 1156 | }, 1157 | { 1158 | 'event_label': 'class3', 1159 | 'event_onset': 5.5, 1160 | 'event_offset': 6.7 1161 | } 1162 | ] 1163 | dcase2013metric = DCASE2013_EventDetection_Metrics(class_list=class_list) 1164 | 1165 | print 'DCASE2013' 1166 | print 'Frame-based:', dcase2013metric.frame_based(system_output=system_output, 1167 | annotated_ground_truth=annotated_groundtruth) 1168 | print 'Event-based:', dcase2013metric.event_based(system_output=system_output, 1169 | annotated_ground_truth=annotated_groundtruth) 1170 | print 'Class-based:', dcase2013metric.class_based(system_output=system_output, 1171 | annotated_ground_truth=annotated_groundtruth) 1172 | 1173 | dcase2016_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=class_list) 1174 | print 'DCASE2016' 1175 | print dcase2016_metric.evaluate(system_output=system_output, annotated_ground_truth=annotated_groundtruth).results() 1176 | 1177 | 1178 | if __name__ == "__main__": 1179 | sys.exit(main(sys.argv)) 1180 | -------------------------------------------------------------------------------- /src/features.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import numpy 5 | import librosa 6 | import scipy 7 | 8 | 9 | def feature_extraction(y, fs=44100, statistics=True, include_mfcc0=True, include_delta=True, 10 | include_acceleration=True, mfcc_params=None, delta_params=None, acceleration_params=None): 11 | """Feature extraction, MFCC based features 12 | 13 | Outputs features in dict, format: 14 | 15 | { 16 | 'feat': feature_matrix [shape=(frame count, feature vector size)], 17 | 'stat': { 18 | 'mean': numpy.mean(feature_matrix, axis=0), 19 | 'std': numpy.std(feature_matrix, axis=0), 20 | 'N': feature_matrix.shape[0], 21 | 'S1': numpy.sum(feature_matrix, axis=0), 22 | 'S2': numpy.sum(feature_matrix ** 2, axis=0), 23 | } 24 | } 25 | 26 | Parameters 27 | ---------- 28 | y: numpy.array [shape=(signal_length, )] 29 | Audio 30 | 31 | fs: int > 0 [scalar] 32 | Sample rate 33 | (Default value=44100) 34 | 35 | statistics: bool 36 | Calculate feature statistics for extracted matrix 37 | (Default value=True) 38 | 39 | include_mfcc0: bool 40 | Include 0th MFCC coefficient into static coefficients. 41 | (Default value=True) 42 | 43 | include_delta: bool 44 | Include delta MFCC coefficients. 45 | (Default value=True) 46 | 47 | include_acceleration: bool 48 | Include acceleration MFCC coefficients. 49 | (Default value=True) 50 | 51 | mfcc_params: dict or None 52 | Parameters for extraction of static MFCC coefficients. 53 | 54 | delta_params: dict or None 55 | Parameters for extraction of delta MFCC coefficients. 56 | 57 | acceleration_params: dict or None 58 | Parameters for extraction of acceleration MFCC coefficients. 59 | 60 | Returns 61 | ------- 62 | result: dict 63 | Feature dict 64 | 65 | """ 66 | 67 | eps = numpy.spacing(1) 68 | 69 | # Windowing function 70 | if mfcc_params['window'] == 'hamming_asymmetric': 71 | window = scipy.signal.hamming(mfcc_params['n_fft'], sym=False) 72 | elif mfcc_params['window'] == 'hamming_symmetric': 73 | window = scipy.signal.hamming(mfcc_params['n_fft'], sym=True) 74 | elif mfcc_params['window'] == 'hann_asymmetric': 75 | window = scipy.signal.hann(mfcc_params['n_fft'], sym=False) 76 | elif mfcc_params['window'] == 'hann_symmetric': 77 | window = scipy.signal.hann(mfcc_params['n_fft'], sym=True) 78 | else: 79 | window = None 80 | 81 | # Calculate Static Coefficients 82 | power_spectrogram = numpy.abs(librosa.stft(y + eps, 83 | n_fft=mfcc_params['n_fft'], 84 | win_length=mfcc_params['win_length'], 85 | hop_length=mfcc_params['hop_length'], 86 | center=True, 87 | window=window))**2 88 | mel_basis = librosa.filters.mel(sr=fs, 89 | n_fft=mfcc_params['n_fft'], 90 | n_mels=mfcc_params['n_mels'], 91 | fmin=mfcc_params['fmin'], 92 | fmax=mfcc_params['fmax'], 93 | htk=mfcc_params['htk']) 94 | mel_spectrum = numpy.dot(mel_basis, power_spectrogram) 95 | mfcc = librosa.feature.mfcc(S=librosa.logamplitude(mel_spectrum), 96 | n_mfcc=mfcc_params['n_mfcc']) 97 | 98 | # Collect the feature matrix 99 | feature_matrix = mfcc 100 | if include_delta: 101 | # Delta coefficients 102 | mfcc_delta = librosa.feature.delta(mfcc, **delta_params) 103 | 104 | # Add Delta Coefficients to feature matrix 105 | feature_matrix = numpy.vstack((feature_matrix, mfcc_delta)) 106 | 107 | if include_acceleration: 108 | # Acceleration coefficients (aka delta delta) 109 | mfcc_delta2 = librosa.feature.delta(mfcc, order=2, **acceleration_params) 110 | 111 | # Add Acceleration Coefficients to feature matrix 112 | feature_matrix = numpy.vstack((feature_matrix, mfcc_delta2)) 113 | 114 | if not include_mfcc0: 115 | # Omit mfcc0 116 | feature_matrix = feature_matrix[1:, :] 117 | 118 | feature_matrix = feature_matrix.T 119 | 120 | # Collect into data structure 121 | if statistics: 122 | return { 123 | 'feat': feature_matrix, 124 | 'stat': { 125 | 'mean': numpy.mean(feature_matrix, axis=0), 126 | 'std': numpy.std(feature_matrix, axis=0), 127 | 'N': feature_matrix.shape[0], 128 | 'S1': numpy.sum(feature_matrix, axis=0), 129 | 'S2': numpy.sum(feature_matrix ** 2, axis=0), 130 | } 131 | } 132 | else: 133 | return { 134 | 'feat': feature_matrix} 135 | 136 | 137 | class FeatureNormalizer(object): 138 | """Feature normalizer class 139 | 140 | Accumulates feature statistics 141 | 142 | Examples 143 | -------- 144 | 145 | >>> normalizer = FeatureNormalizer() 146 | >>> for feature_matrix in training_items: 147 | >>> normalizer.accumulate(feature_matrix) 148 | >>> 149 | >>> normalizer.finalize() 150 | 151 | >>> for feature_matrix in test_items: 152 | >>> feature_matrix_normalized = normalizer.normalize(feature_matrix) 153 | >>> # used the features 154 | 155 | """ 156 | def __init__(self, feature_matrix=None): 157 | """__init__ method. 158 | 159 | Parameters 160 | ---------- 161 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)] or None 162 | Feature matrix to be used in the initialization 163 | 164 | """ 165 | if feature_matrix is None: 166 | self.N = 0 167 | self.mean = 0 168 | self.S1 = 0 169 | self.S2 = 0 170 | self.std = 0 171 | else: 172 | self.mean = numpy.mean(feature_matrix, axis=0) 173 | self.std = numpy.std(feature_matrix, axis=0) 174 | self.N = feature_matrix.shape[0] 175 | self.S1 = numpy.sum(feature_matrix, axis=0) 176 | self.S2 = numpy.sum(feature_matrix ** 2, axis=0) 177 | self.finalize() 178 | 179 | def __enter__(self): 180 | # Initialize Normalization class and return it 181 | self.N = 0 182 | self.mean = 0 183 | self.S1 = 0 184 | self.S2 = 0 185 | self.std = 0 186 | return self 187 | 188 | def __exit__(self, type, value, traceback): 189 | # Finalize accumulated calculation 190 | self.finalize() 191 | 192 | def accumulate(self, stat): 193 | """Accumalate statistics 194 | 195 | Input is statistics dict, format: 196 | 197 | { 198 | 'mean': numpy.mean(feature_matrix, axis=0), 199 | 'std': numpy.std(feature_matrix, axis=0), 200 | 'N': feature_matrix.shape[0], 201 | 'S1': numpy.sum(feature_matrix, axis=0), 202 | 'S2': numpy.sum(feature_matrix ** 2, axis=0), 203 | } 204 | 205 | Parameters 206 | ---------- 207 | stat : dict 208 | Statistics dict 209 | 210 | Returns 211 | ------- 212 | nothing 213 | 214 | """ 215 | self.N += stat['N'] 216 | self.mean += stat['mean'] 217 | self.S1 += stat['S1'] 218 | self.S2 += stat['S2'] 219 | 220 | def finalize(self): 221 | """Finalize statistics calculation 222 | 223 | Accumulated values are used to get mean and std for the seen feature data. 224 | 225 | Parameters 226 | ---------- 227 | nothing 228 | 229 | Returns 230 | ------- 231 | nothing 232 | 233 | """ 234 | 235 | # Finalize statistics 236 | self.mean = self.S1 / self.N 237 | self.std = numpy.sqrt((self.N * self.S2 - (self.S1 * self.S1)) / (self.N * (self.N - 1))) 238 | 239 | # In case we have very brain-death material we get std = Nan => 0.0 240 | self.std = numpy.nan_to_num(self.std) 241 | 242 | self.mean = numpy.reshape(self.mean, [1, -1]) 243 | self.std = numpy.reshape(self.std, [1, -1]) 244 | 245 | def normalize(self, feature_matrix): 246 | """Normalize feature matrix with internal statistics of the class 247 | 248 | Parameters 249 | ---------- 250 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)] 251 | Feature matrix to be normalized 252 | 253 | Returns 254 | ------- 255 | feature_matrix : numpy.ndarray [shape=(frames, number of feature values)] 256 | Normalized feature matrix 257 | 258 | """ 259 | 260 | return (feature_matrix - self.mean) / self.std 261 | -------------------------------------------------------------------------------- /src/files.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import os 5 | import wave 6 | import numpy 7 | import csv 8 | import cPickle as pickle 9 | import librosa 10 | import yaml 11 | import soundfile 12 | 13 | def load_audio(filename, mono=True, fs=44100): 14 | """Load audio file into numpy array 15 | 16 | Supports 24-bit wav-format, and flac audio through librosa. 17 | 18 | Parameters 19 | ---------- 20 | filename: str 21 | Path to audio file 22 | 23 | mono : bool 24 | In case of multi-channel audio, channels are averaged into single channel. 25 | (Default value=True) 26 | 27 | fs : int > 0 [scalar] 28 | Target sample rate, if input audio does not fulfil this, audio is resampled. 29 | (Default value=44100) 30 | 31 | Returns 32 | ------- 33 | audio_data : numpy.ndarray [shape=(signal_length, channel)] 34 | Audio 35 | 36 | sample_rate : integer 37 | Sample rate 38 | 39 | """ 40 | 41 | file_base, file_extension = os.path.splitext(filename) 42 | if file_extension == '.wav': 43 | # Load audio 44 | audio_data, sample_rate = soundfile.read(filename) 45 | audio_data = audio_data.T 46 | 47 | if mono: 48 | # Down-mix audio 49 | audio_data = numpy.mean(audio_data, axis=0) 50 | 51 | # Resample 52 | if fs != sample_rate: 53 | audio_data = librosa.core.resample(audio_data, sample_rate, fs) 54 | sample_rate = fs 55 | 56 | return audio_data, sample_rate 57 | 58 | elif file_extension == '.flac': 59 | audio_data, sample_rate = librosa.load(filename, sr=fs, mono=mono) 60 | 61 | return audio_data, sample_rate 62 | 63 | return None, None 64 | 65 | 66 | def load_event_list(file): 67 | """Load event list from tab delimited text file (csv-formated) 68 | 69 | Supported input formats: 70 | 71 | - [event_onset (float)][tab][event_offset (float)] 72 | - [event_onset (float)][tab][event_offset (float)][tab][event_label (string)] 73 | - [file(string)[tab][scene_label][tab][event_onset (float)][tab][event_offset (float)][tab][event_label (string)] 74 | 75 | Event dict format: 76 | 77 | { 78 | 'file': 'filename', 79 | 'scene_label': 'office', 80 | 'event_onset': 0.0, 81 | 'event_offset': 1.0, 82 | 'event_label': 'people_walking', 83 | } 84 | 85 | Parameters 86 | ---------- 87 | file : str 88 | Path to the event list in text format (csv) 89 | 90 | Returns 91 | ------- 92 | data : list of event dicts 93 | List containing event dicts 94 | 95 | """ 96 | data = [] 97 | with open(file, 'rt') as f: 98 | for row in csv.reader(f, delimiter='\t'): 99 | if len(row) == 2: 100 | data.append( 101 | { 102 | 'event_onset': float(row[0]), 103 | 'event_offset': float(row[1]) 104 | } 105 | ) 106 | elif len(row) == 3: 107 | data.append( 108 | { 109 | 'event_onset': float(row[0]), 110 | 'event_offset': float(row[1]), 111 | 'event_label': row[2] 112 | } 113 | ) 114 | elif len(row) == 4: 115 | data.append( 116 | { 117 | 'file': row[0], 118 | 'event_onset': float(row[1]), 119 | 'event_offset': float(row[2]), 120 | 'event_label': row[3] 121 | } 122 | ) 123 | elif len(row) == 5: 124 | data.append( 125 | { 126 | 'file': row[0], 127 | 'scene_label': row[1], 128 | 'event_onset': float(row[2]), 129 | 'event_offset': float(row[3]), 130 | 'event_label': row[4] 131 | } 132 | ) 133 | return data 134 | 135 | 136 | def save_data(filename, data): 137 | """Save variable into a pickle file 138 | 139 | Parameters 140 | ---------- 141 | filename: str 142 | Path to file 143 | 144 | data: list or dict 145 | Data to be saved. 146 | 147 | Returns 148 | ------- 149 | nothing 150 | 151 | """ 152 | 153 | pickle.dump(data, open(filename, 'wb'), protocol=pickle.HIGHEST_PROTOCOL) 154 | 155 | 156 | def load_data(filename): 157 | """Load data from pickle file 158 | 159 | Parameters 160 | ---------- 161 | filename: str 162 | Path to file 163 | 164 | Returns 165 | ------- 166 | data: list or dict 167 | Loaded file. 168 | 169 | """ 170 | 171 | return pickle.load(open(filename, "rb")) 172 | 173 | 174 | def save_parameters(filename, parameters): 175 | """Save parameters to YAML-file 176 | 177 | Parameters 178 | ---------- 179 | filename: str 180 | Path to file 181 | parameters: dict 182 | Dict containing parameters to be saved 183 | 184 | Returns 185 | ------- 186 | Nothing 187 | 188 | """ 189 | 190 | with open(filename, 'w') as outfile: 191 | outfile.write(yaml.dump(parameters, default_flow_style=False)) 192 | 193 | 194 | def load_parameters(filename): 195 | """Load parameters from YAML-file 196 | 197 | Parameters 198 | ---------- 199 | filename: str 200 | Path to file 201 | 202 | Returns 203 | ------- 204 | parameters: dict 205 | Dict containing loaded parameters 206 | 207 | Raises 208 | ------- 209 | IOError 210 | file is not found. 211 | 212 | """ 213 | 214 | if os.path.isfile(filename): 215 | with open(filename, 'r') as f: 216 | return yaml.load(f) 217 | else: 218 | raise IOError("Parameter file not found [%s]" % filename) 219 | 220 | 221 | def save_text(filename, text): 222 | """Save text into text file. 223 | 224 | Parameters 225 | ---------- 226 | filename: str 227 | Path to file 228 | 229 | text: str 230 | String to be saved. 231 | 232 | Returns 233 | ------- 234 | nothing 235 | 236 | """ 237 | 238 | with open(filename, "w") as text_file: 239 | text_file.write(text) 240 | 241 | 242 | def load_text(filename): 243 | """Load text file 244 | 245 | Parameters 246 | ---------- 247 | filename: str 248 | Path to file 249 | 250 | Returns 251 | ------- 252 | text: string 253 | Loaded text. 254 | 255 | """ 256 | 257 | with open(filename, 'r') as f: 258 | return f.readlines() 259 | -------------------------------------------------------------------------------- /src/general.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import os 5 | import hashlib 6 | import json 7 | 8 | 9 | def check_path(path): 10 | """Check if path exists, if not creates one 11 | 12 | Parameters 13 | ---------- 14 | path : str 15 | Path to be checked. 16 | 17 | Returns 18 | ------- 19 | Nothing 20 | 21 | """ 22 | 23 | if not os.path.isdir(path): 24 | os.makedirs(path) 25 | 26 | 27 | def get_parameter_hash(params): 28 | """Get unique hash string (md5) for given parameter dict 29 | 30 | Parameters 31 | ---------- 32 | params : dict 33 | Input parameters 34 | 35 | Returns 36 | ------- 37 | md5_hash : str 38 | Unique hash for parameter dict 39 | 40 | """ 41 | 42 | md5 = hashlib.md5() 43 | md5.update(str(json.dumps(params, sort_keys=True))) 44 | return md5.hexdigest() 45 | 46 | -------------------------------------------------------------------------------- /src/sound_event_detection.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import numpy 5 | 6 | 7 | def event_detection(feature_data, model_container, hop_length_seconds=0.01, smoothing_window_length_seconds=1.0, decision_threshold=0.0, minimum_event_length=0.1, minimum_event_gap=0.1): 8 | """Sound event detection 9 | 10 | Parameters 11 | ---------- 12 | feature_data : numpy.ndarray [shape=(n_features, t)] 13 | Feature matrix 14 | 15 | model_container : dict 16 | Sound event model pairs [positive and negative] in dict 17 | 18 | hop_length_seconds : float > 0.0 19 | Feature hop length in seconds, used to convert feature index into time-stamp 20 | (Default value=0.01) 21 | 22 | smoothing_window_length_seconds : float > 0.0 23 | Accumulation window (look-back) length, withing the window likelihoods are accumulated. 24 | (Default value=1.0) 25 | 26 | decision_threshold : float > 0.0 27 | Likelihood ratio threshold for making the decision. 28 | (Default value=0.0) 29 | 30 | minimum_event_length : float > 0.0 31 | Minimum event length in seconds, shorten than given are filtered out from the output. 32 | (Default value=0.1) 33 | 34 | minimum_event_gap : float > 0.0 35 | Minimum allowed gap between events in seconds from same event label class. 36 | (Default value=0.1) 37 | 38 | Returns 39 | ------- 40 | results : list (event dicts) 41 | Detection result, event list 42 | 43 | """ 44 | 45 | smoothing_window = int(smoothing_window_length_seconds / hop_length_seconds) 46 | 47 | results = [] 48 | for event_label in model_container['models']: 49 | positive = model_container['models'][event_label]['positive'].score_samples(feature_data)[0] 50 | negative = model_container['models'][event_label]['negative'].score_samples(feature_data)[0] 51 | 52 | # Lets keep the system causal and use look-back while smoothing (accumulating) likelihoods 53 | for stop_id in range(0, feature_data.shape[0]): 54 | start_id = stop_id - smoothing_window 55 | if start_id < 0: 56 | start_id = 0 57 | positive[start_id] = sum(positive[start_id:stop_id]) 58 | negative[start_id] = sum(negative[start_id:stop_id]) 59 | 60 | likelihood_ratio = positive - negative 61 | event_activity = likelihood_ratio > decision_threshold 62 | 63 | # Find contiguous segments and convert frame-ids into times 64 | event_segments = contiguous_regions(event_activity) * hop_length_seconds 65 | 66 | # Preprocess the event segments 67 | event_segments = postprocess_event_segments(event_segments=event_segments, 68 | minimum_event_length=minimum_event_length, 69 | minimum_event_gap=minimum_event_gap) 70 | 71 | for event in event_segments: 72 | results.append((event[0], event[1], event_label)) 73 | 74 | return results 75 | 76 | 77 | def contiguous_regions(activity_array): 78 | """Find contiguous regions from bool valued numpy.array. 79 | Transforms boolean values for each frame into pairs of onsets and offsets. 80 | 81 | Parameters 82 | ---------- 83 | activity_array : numpy.array [shape=(t)] 84 | Event activity array, bool values 85 | 86 | Returns 87 | ------- 88 | change_indices : numpy.ndarray [shape=(2, number of found changes)] 89 | Onset and offset indices pairs in matrix 90 | 91 | """ 92 | 93 | # Find the changes in the activity_array 94 | change_indices = numpy.diff(activity_array).nonzero()[0] 95 | 96 | # Shift change_index with one, focus on frame after the change. 97 | change_indices += 1 98 | 99 | if activity_array[0]: 100 | # If the first element of activity_array is True add 0 at the beginning 101 | change_indices = numpy.r_[0, change_indices] 102 | 103 | if activity_array[-1]: 104 | # If the last element of activity_array is True, add the length of the array 105 | change_indices = numpy.r_[change_indices, activity_array.size] 106 | 107 | # Reshape the result into two columns 108 | return change_indices.reshape((-1, 2)) 109 | 110 | 111 | def postprocess_event_segments(event_segments, minimum_event_length=0.1, minimum_event_gap=0.1): 112 | """Post process event segment list. Makes sure that minimum event length and minimum event gap conditions are met. 113 | 114 | Parameters 115 | ---------- 116 | event_segments : numpy.ndarray [shape=(2, number of event)] 117 | Event segments, first column has the onset, second has the offset. 118 | 119 | minimum_event_length : float > 0.0 120 | Minimum event length in seconds, shorten than given are filtered out from the output. 121 | (Default value=0.1) 122 | 123 | minimum_event_gap : float > 0.0 124 | Minimum allowed gap between events in seconds from same event label class. 125 | (Default value=0.1) 126 | 127 | Returns 128 | ------- 129 | event_results : numpy.ndarray [shape=(2, number of event)] 130 | postprocessed event segments 131 | 132 | """ 133 | 134 | # 1. remove short events 135 | event_results_1 = [] 136 | for event in event_segments: 137 | if event[1]-event[0] >= minimum_event_length: 138 | event_results_1.append((event[0], event[1])) 139 | 140 | if len(event_results_1): 141 | # 2. remove small gaps between events 142 | event_results_2 = [] 143 | 144 | # Load first event into event buffer 145 | buffered_event_onset = event_results_1[0][0] 146 | buffered_event_offset = event_results_1[0][1] 147 | for i in range(1, len(event_results_1)): 148 | if event_results_1[i][0] - buffered_event_offset > minimum_event_gap: 149 | # The gap between current event and the buffered is bigger than minimum event gap, 150 | # store event, and replace buffered event 151 | event_results_2.append((buffered_event_onset, buffered_event_offset)) 152 | buffered_event_onset = event_results_1[i][0] 153 | buffered_event_offset = event_results_1[i][1] 154 | else: 155 | # The gap between current event and the buffered is smalle than minimum event gap, 156 | # extend the buffered event until the current offset 157 | buffered_event_offset = event_results_1[i][1] 158 | 159 | # Store last event from buffer 160 | event_results_2.append((buffered_event_onset, buffered_event_offset)) 161 | 162 | return event_results_2 163 | else: 164 | return event_results_1 165 | -------------------------------------------------------------------------------- /src/ui.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import itertools 6 | 7 | spinner = itertools.cycle(["`", "*", ";", ","]) 8 | 9 | 10 | def title(text): 11 | """Prints title 12 | 13 | Parameters 14 | ---------- 15 | text : str 16 | Title 17 | 18 | Returns 19 | ------- 20 | Nothing 21 | 22 | """ 23 | 24 | print "--------------------------------" 25 | print text 26 | print "--------------------------------" 27 | 28 | 29 | def section_header(text): 30 | """Prints section header 31 | 32 | Parameters 33 | ---------- 34 | text : str 35 | Section header 36 | 37 | Returns 38 | ------- 39 | Nothing 40 | 41 | """ 42 | 43 | print " " 44 | print text 45 | print "================================" 46 | 47 | 48 | def foot(): 49 | """Prints foot 50 | 51 | Parameters 52 | ---------- 53 | Nothing 54 | 55 | Returns 56 | ------- 57 | Nothing 58 | 59 | """ 60 | 61 | print " [Done] " 62 | 63 | 64 | def progress(title_text=None, fold=None, percentage=None, note=None, label=None): 65 | """Prints progress line 66 | 67 | Parameters 68 | ---------- 69 | title_text : str or None 70 | Title 71 | 72 | fold : int > 0 [scalar] or None 73 | Fold number 74 | 75 | percentage : float [0-1] or None 76 | Progress percentage. 77 | 78 | note : str or None 79 | Note 80 | 81 | label : str or None 82 | Label 83 | 84 | Returns 85 | ------- 86 | Nothing 87 | 88 | """ 89 | 90 | if title_text is not None and fold is not None and percentage is not None and note is not None and label is None: 91 | print " {:2s} {:20s} fold[{:1d}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, fold,percentage * 100, note), 92 | 93 | elif title_text is not None and fold is not None and percentage is None and note is not None and label is None: 94 | print " {:2s} {:20s} fold[{:1d}] [{:20s}] \r".format(spinner.next(), title_text, fold, note), 95 | 96 | elif title_text is not None and fold is None and percentage is not None and note is not None and label is None: 97 | print " {:2s} {:20s} [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, percentage * 100, note), 98 | 99 | elif title_text is not None and fold is None and percentage is not None and note is None and label is None: 100 | print " {:2s} {:20s} [{:3.0f}%] \r".format(spinner.next(), title_text, percentage * 100), 101 | 102 | elif title_text is not None and fold is None and percentage is None and note is not None and label is None: 103 | print " {:2s} {:20s} [{:20s}] \r".format(spinner.next(), title_text, note), 104 | 105 | elif title_text is not None and fold is None and percentage is None and note is not None and label is not None: 106 | print " {:2s} {:20s} [{:20s}] [{:20s}] \r".format(spinner.next(), title_text, label, note), 107 | 108 | elif title_text is not None and fold is None and percentage is not None and note is not None and label is not None: 109 | print " {:2s} {:20s} [{:20s}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, label, percentage * 100, note), 110 | 111 | elif title_text is not None and fold is not None and percentage is not None and note is not None and label is not None: 112 | print " {:2s} {:20s} fold[{:1d}] [{:10s}] [{:3.0f}%] [{:20s}] \r".format(spinner.next(), title_text, fold, label, percentage * 100, note), 113 | 114 | elif title_text is not None and fold is not None and percentage is None and note is None and label is not None: 115 | print " {:2s} {:20s} fold[{:1d}] [{:10s}] \r".format(spinner.next(), title_text, fold, label), 116 | 117 | sys.stdout.flush() 118 | -------------------------------------------------------------------------------- /task1_scene_classification.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # DCASE 2016::Acoustic Scene Classification / Baseline System 5 | 6 | from src.ui import * 7 | from src.general import * 8 | from src.files import * 9 | 10 | from src.features import * 11 | from src.dataset import * 12 | from src.evaluation import * 13 | 14 | import numpy 15 | import csv 16 | import argparse 17 | import textwrap 18 | import copy 19 | 20 | from sklearn import mixture 21 | 22 | __version_info__ = ('1', '0', '0') 23 | __version__ = '.'.join(__version_info__) 24 | 25 | 26 | def main(argv): 27 | numpy.random.seed(123456) # let's make randomization predictable 28 | 29 | parser = argparse.ArgumentParser( 30 | prefix_chars='-+', 31 | formatter_class=argparse.RawDescriptionHelpFormatter, 32 | description=textwrap.dedent('''\ 33 | DCASE 2016 34 | Task 1: Acoustic Scene Classification 35 | Baseline system 36 | --------------------------------------------- 37 | Tampere University of Technology / Audio Research Group 38 | Author: Toni Heittola ( toni.heittola@tut.fi ) 39 | 40 | System description 41 | This is an baseline implementation for D-CASE 2016 challenge acoustic scene classification task. 42 | Features: MFCC (static+delta+acceleration) 43 | Classifier: GMM 44 | 45 | ''')) 46 | 47 | # Setup argument handling 48 | parser.add_argument("-development", help="Use the system in the development mode", action='store_true', 49 | default=False, dest='development') 50 | parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true', 51 | default=False, dest='challenge') 52 | 53 | parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__) 54 | args = parser.parse_args() 55 | 56 | # Load parameters from config file 57 | parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 58 | os.path.splitext(os.path.basename(__file__))[0]+'.yaml') 59 | params = load_parameters(parameter_file) 60 | params = process_parameters(params) 61 | make_folders(params) 62 | 63 | title("DCASE 2016::Acoustic Scene Classification / Baseline System") 64 | 65 | # Check if mode is defined 66 | if not (args.development or args.challenge): 67 | args.development = True 68 | args.challenge = False 69 | 70 | dataset_evaluation_mode = 'folds' 71 | if args.development and not args.challenge: 72 | print "Running system in development mode" 73 | dataset_evaluation_mode = 'folds' 74 | elif not args.development and args.challenge: 75 | print "Running system in challenge mode" 76 | dataset_evaluation_mode = 'full' 77 | 78 | # Get dataset container class 79 | dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data']) 80 | 81 | # Fetch data over internet and setup the data 82 | # ================================================== 83 | if params['flow']['initialize']: 84 | dataset.fetch() 85 | 86 | # Extract features for all audio files in the dataset 87 | # ================================================== 88 | if params['flow']['extract_features']: 89 | section_header('Feature extraction') 90 | 91 | # Collect files in train sets and test sets 92 | files = [] 93 | for fold in dataset.folds(mode=dataset_evaluation_mode): 94 | for item_id, item in enumerate(dataset.train(fold)): 95 | if item['file'] not in files: 96 | files.append(item['file']) 97 | for item_id, item in enumerate(dataset.test(fold)): 98 | if item['file'] not in files: 99 | files.append(item['file']) 100 | files = sorted(files) 101 | 102 | # Go through files and make sure all features are extracted 103 | do_feature_extraction(files=files, 104 | dataset=dataset, 105 | feature_path=params['path']['features'], 106 | params=params['features'], 107 | overwrite=params['general']['overwrite']) 108 | 109 | foot() 110 | 111 | # Prepare feature normalizers 112 | # ================================================== 113 | if params['flow']['feature_normalizer']: 114 | section_header('Feature normalizer') 115 | 116 | do_feature_normalization(dataset=dataset, 117 | feature_normalizer_path=params['path']['feature_normalizers'], 118 | feature_path=params['path']['features'], 119 | dataset_evaluation_mode=dataset_evaluation_mode, 120 | overwrite=params['general']['overwrite']) 121 | 122 | foot() 123 | 124 | # System training 125 | # ================================================== 126 | if params['flow']['train_system']: 127 | section_header('System training') 128 | 129 | do_system_training(dataset=dataset, 130 | model_path=params['path']['models'], 131 | feature_normalizer_path=params['path']['feature_normalizers'], 132 | feature_path=params['path']['features'], 133 | feature_params=params['features'], 134 | classifier_params=params['classifier']['parameters'], 135 | classifier_method=params['classifier']['method'], 136 | dataset_evaluation_mode=dataset_evaluation_mode, 137 | clean_audio_errors=params['classifier']['audio_error_handling']['clean_data'], 138 | overwrite=params['general']['overwrite'] 139 | ) 140 | 141 | foot() 142 | 143 | # System evaluation in development mode 144 | if args.development and not args.challenge: 145 | 146 | # System testing 147 | # ================================================== 148 | if params['flow']['test_system']: 149 | section_header('System testing') 150 | 151 | do_system_testing(dataset=dataset, 152 | feature_path=params['path']['features'], 153 | result_path=params['path']['results'], 154 | model_path=params['path']['models'], 155 | feature_params=params['features'], 156 | dataset_evaluation_mode=dataset_evaluation_mode, 157 | classifier_method=params['classifier']['method'], 158 | clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'], 159 | overwrite=params['general']['overwrite'] 160 | ) 161 | 162 | foot() 163 | 164 | # System evaluation 165 | # ================================================== 166 | if params['flow']['evaluate_system']: 167 | section_header('System evaluation') 168 | 169 | do_system_evaluation(dataset=dataset, 170 | dataset_evaluation_mode=dataset_evaluation_mode, 171 | result_path=params['path']['results']) 172 | 173 | foot() 174 | 175 | # System evaluation with challenge data 176 | elif not args.development and args.challenge: 177 | # Fetch data over internet and setup the data 178 | challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data']) 179 | if params['general']['challenge_submission_mode']: 180 | result_path = params['path']['challenge_results'] 181 | else: 182 | result_path = params['path']['results'] 183 | 184 | if params['flow']['initialize']: 185 | challenge_dataset.fetch() 186 | 187 | if not params['general']['challenge_submission_mode']: 188 | section_header('Feature extraction for challenge data') 189 | 190 | # Extract feature if not running in challenge submission mode. 191 | # Collect test files 192 | files = [] 193 | for fold in challenge_dataset.folds(mode=dataset_evaluation_mode): 194 | for item_id, item in enumerate(dataset.test(fold)): 195 | if item['file'] not in files: 196 | files.append(item['file']) 197 | files = sorted(files) 198 | 199 | # Go through files and make sure all features are extracted 200 | do_feature_extraction(files=files, 201 | dataset=challenge_dataset, 202 | feature_path=params['path']['features'], 203 | params=params['features'], 204 | overwrite=params['general']['overwrite']) 205 | foot() 206 | 207 | # System testing 208 | if params['flow']['test_system']: 209 | section_header('System testing with challenge data') 210 | 211 | do_system_testing(dataset=challenge_dataset, 212 | feature_path=params['path']['features'], 213 | result_path=result_path, 214 | model_path=params['path']['models'], 215 | feature_params=params['features'], 216 | dataset_evaluation_mode=dataset_evaluation_mode, 217 | classifier_method=params['classifier']['method'], 218 | clean_audio_errors=params['recognizer']['audio_error_handling']['clean_data'], 219 | overwrite=params['general']['overwrite'] or params['general']['challenge_submission_mode'] 220 | ) 221 | foot() 222 | 223 | if params['general']['challenge_submission_mode']: 224 | print " " 225 | print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]" 226 | print " " 227 | 228 | # System evaluation if not in challenge submission mode 229 | if params['flow']['evaluate_system'] and not params['general']['challenge_submission_mode']: 230 | section_header('System evaluation with challenge data') 231 | do_system_evaluation(dataset=challenge_dataset, 232 | dataset_evaluation_mode=dataset_evaluation_mode, 233 | result_path=result_path) 234 | 235 | foot() 236 | 237 | return 0 238 | 239 | 240 | def process_parameters(params): 241 | """Parameter post-processing. 242 | 243 | Parameters 244 | ---------- 245 | params : dict 246 | parameters in dict 247 | 248 | Returns 249 | ------- 250 | params : dict 251 | processed parameters 252 | 253 | """ 254 | 255 | # Convert feature extraction window and hop sizes seconds to samples 256 | params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs']) 257 | params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs']) 258 | 259 | # Copy parameters for current classifier method 260 | params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']] 261 | 262 | # Hash 263 | params['features']['hash'] = get_parameter_hash(params['features']) 264 | 265 | # Let's keep hashes backwards compatible after added parameters. 266 | # Only if error handling is used, they are included in the hash. 267 | classifier_params = copy.copy(params['classifier']) 268 | if not classifier_params['audio_error_handling']['clean_data']: 269 | del classifier_params['audio_error_handling'] 270 | params['classifier']['hash'] = get_parameter_hash(classifier_params) 271 | 272 | params['recognizer']['hash'] = get_parameter_hash(params['recognizer']) 273 | 274 | # Paths 275 | params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data']) 276 | params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base']) 277 | 278 | # Features 279 | params['path']['features_'] = params['path']['features'] 280 | params['path']['features'] = os.path.join(params['path']['base'], 281 | params['path']['features'], 282 | params['features']['hash']) 283 | 284 | # Feature normalizers 285 | params['path']['feature_normalizers_'] = params['path']['feature_normalizers'] 286 | params['path']['feature_normalizers'] = os.path.join(params['path']['base'], 287 | params['path']['feature_normalizers'], 288 | params['features']['hash']) 289 | 290 | # Models 291 | params['path']['models_'] = params['path']['models'] 292 | params['path']['models'] = os.path.join(params['path']['base'], 293 | params['path']['models'], 294 | params['features']['hash'], 295 | params['classifier']['hash']) 296 | # Results 297 | params['path']['results_'] = params['path']['results'] 298 | params['path']['results'] = os.path.join(params['path']['base'], 299 | params['path']['results'], 300 | params['features']['hash'], 301 | params['classifier']['hash'], 302 | params['recognizer']['hash']) 303 | 304 | return params 305 | 306 | 307 | def make_folders(params, parameter_filename='parameters.yaml'): 308 | """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data. 309 | 310 | Parameters 311 | ---------- 312 | params : dict 313 | parameters in dict 314 | 315 | parameter_filename : str 316 | filename to save parameters used to generate the folder name 317 | 318 | Returns 319 | ------- 320 | nothing 321 | 322 | """ 323 | 324 | # Check that target path exists, create if not 325 | check_path(params['path']['features']) 326 | check_path(params['path']['feature_normalizers']) 327 | check_path(params['path']['models']) 328 | check_path(params['path']['results']) 329 | 330 | # Save parameters into folders to help manual browsing of files. 331 | 332 | # Features 333 | feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename) 334 | if not os.path.isfile(feature_parameter_filename): 335 | save_parameters(feature_parameter_filename, params['features']) 336 | 337 | # Feature normalizers 338 | feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename) 339 | if not os.path.isfile(feature_normalizer_parameter_filename): 340 | save_parameters(feature_normalizer_parameter_filename, params['features']) 341 | 342 | # Models 343 | model_features_parameter_filename = os.path.join(params['path']['base'], 344 | params['path']['models_'], 345 | params['features']['hash'], 346 | parameter_filename) 347 | if not os.path.isfile(model_features_parameter_filename): 348 | save_parameters(model_features_parameter_filename, params['features']) 349 | 350 | model_models_parameter_filename = os.path.join(params['path']['base'], 351 | params['path']['models_'], 352 | params['features']['hash'], 353 | params['classifier']['hash'], 354 | parameter_filename) 355 | if not os.path.isfile(model_models_parameter_filename): 356 | save_parameters(model_models_parameter_filename, params['classifier']) 357 | 358 | # Results 359 | # Save parameters into folders to help manual browsing of files. 360 | result_features_parameter_filename = os.path.join(params['path']['base'], 361 | params['path']['results_'], 362 | params['features']['hash'], 363 | parameter_filename) 364 | if not os.path.isfile(result_features_parameter_filename): 365 | save_parameters(result_features_parameter_filename, params['features']) 366 | 367 | result_models_parameter_filename = os.path.join(params['path']['base'], 368 | params['path']['results_'], 369 | params['features']['hash'], 370 | params['classifier']['hash'], 371 | parameter_filename) 372 | if not os.path.isfile(result_models_parameter_filename): 373 | save_parameters(result_models_parameter_filename, params['classifier']) 374 | 375 | result_models_parameter_filename = os.path.join(params['path']['base'], 376 | params['path']['results_'], 377 | params['features']['hash'], 378 | params['classifier']['hash'], 379 | params['recognizer']['hash'], 380 | parameter_filename) 381 | if not os.path.isfile(result_models_parameter_filename): 382 | save_parameters(result_models_parameter_filename, params['recognizer']) 383 | 384 | def get_feature_filename(audio_file, path, extension='cpickle'): 385 | """Get feature filename 386 | 387 | Parameters 388 | ---------- 389 | audio_file : str 390 | audio file name from which the features are extracted 391 | 392 | path : str 393 | feature path 394 | 395 | extension : str 396 | file extension 397 | (Default value='cpickle') 398 | 399 | Returns 400 | ------- 401 | feature_filename : str 402 | full feature filename 403 | 404 | """ 405 | 406 | audio_filename = os.path.split(audio_file)[1] 407 | return os.path.join(path, os.path.splitext(audio_filename)[0] + '.' + extension) 408 | 409 | 410 | def get_feature_normalizer_filename(fold, path, extension='cpickle'): 411 | """Get normalizer filename 412 | 413 | Parameters 414 | ---------- 415 | fold : int >= 0 416 | evaluation fold number 417 | 418 | path : str 419 | normalizer path 420 | 421 | extension : str 422 | file extension 423 | (Default value='cpickle') 424 | 425 | Returns 426 | ------- 427 | normalizer_filename : str 428 | full normalizer filename 429 | 430 | """ 431 | 432 | return os.path.join(path, 'scale_fold' + str(fold) + '.' + extension) 433 | 434 | 435 | def get_model_filename(fold, path, extension='cpickle'): 436 | """Get model filename 437 | 438 | Parameters 439 | ---------- 440 | fold : int >= 0 441 | evaluation fold number 442 | 443 | path : str 444 | model path 445 | 446 | extension : str 447 | file extension 448 | (Default value='cpickle') 449 | 450 | Returns 451 | ------- 452 | model_filename : str 453 | full model filename 454 | 455 | """ 456 | 457 | return os.path.join(path, 'model_fold' + str(fold) + '.' + extension) 458 | 459 | 460 | def get_result_filename(fold, path, extension='txt'): 461 | """Get result filename 462 | 463 | Parameters 464 | ---------- 465 | fold : int >= 0 466 | evaluation fold number 467 | 468 | path : str 469 | result path 470 | 471 | extension : str 472 | file extension 473 | (Default value='cpickle') 474 | 475 | Returns 476 | ------- 477 | result_filename : str 478 | full result filename 479 | 480 | """ 481 | 482 | if fold == 0: 483 | return os.path.join(path, 'results.' + extension) 484 | else: 485 | return os.path.join(path, 'results_fold' + str(fold) + '.' + extension) 486 | 487 | 488 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False): 489 | """Feature extraction 490 | 491 | Parameters 492 | ---------- 493 | files : list 494 | file list 495 | 496 | dataset : class 497 | dataset class 498 | 499 | feature_path : str 500 | path where the features are saved 501 | 502 | params : dict 503 | parameter dict 504 | 505 | overwrite : bool 506 | overwrite existing feature files 507 | (Default value=False) 508 | 509 | Returns 510 | ------- 511 | nothing 512 | 513 | Raises 514 | ------- 515 | IOError 516 | Audio file not found. 517 | 518 | """ 519 | 520 | # Check that target path exists, create if not 521 | check_path(feature_path) 522 | 523 | for file_id, audio_filename in enumerate(files): 524 | # Get feature filename 525 | current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path) 526 | 527 | progress(title_text='Extracting', 528 | percentage=(float(file_id) / len(files)), 529 | note=os.path.split(audio_filename)[1]) 530 | 531 | if not os.path.isfile(current_feature_file) or overwrite: 532 | # Load audio data 533 | if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)): 534 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs']) 535 | else: 536 | raise IOError("Audio file not found [%s]" % audio_filename) 537 | 538 | # Extract features 539 | feature_data = feature_extraction(y=y, 540 | fs=fs, 541 | include_mfcc0=params['include_mfcc0'], 542 | include_delta=params['include_delta'], 543 | include_acceleration=params['include_acceleration'], 544 | mfcc_params=params['mfcc'], 545 | delta_params=params['mfcc_delta'], 546 | acceleration_params=params['mfcc_acceleration']) 547 | # Save 548 | save_data(current_feature_file, feature_data) 549 | 550 | 551 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False): 552 | """Feature normalization 553 | 554 | Calculated normalization factors for each evaluation fold based on the training material available. 555 | 556 | Parameters 557 | ---------- 558 | dataset : class 559 | dataset class 560 | 561 | feature_normalizer_path : str 562 | path where the feature normalizers are saved. 563 | 564 | feature_path : str 565 | path where the features are saved. 566 | 567 | dataset_evaluation_mode : str ['folds', 'full'] 568 | evaluation mode, 'full' all material available is considered to belong to one fold. 569 | (Default value='folds') 570 | 571 | overwrite : bool 572 | overwrite existing normalizers 573 | (Default value=False) 574 | 575 | Returns 576 | ------- 577 | nothing 578 | 579 | Raises 580 | ------- 581 | IOError 582 | Feature file not found. 583 | 584 | """ 585 | 586 | # Check that target path exists, create if not 587 | check_path(feature_normalizer_path) 588 | 589 | for fold in dataset.folds(mode=dataset_evaluation_mode): 590 | current_normalizer_file = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path) 591 | 592 | if not os.path.isfile(current_normalizer_file) or overwrite: 593 | # Initialize statistics 594 | file_count = len(dataset.train(fold)) 595 | normalizer = FeatureNormalizer() 596 | 597 | for item_id, item in enumerate(dataset.train(fold)): 598 | progress(title_text='Collecting data', 599 | fold=fold, 600 | percentage=(float(item_id) / file_count), 601 | note=os.path.split(item['file'])[1]) 602 | # Load features 603 | if os.path.isfile(get_feature_filename(audio_file=item['file'], path=feature_path)): 604 | feature_data = load_data(get_feature_filename(audio_file=item['file'], path=feature_path))['stat'] 605 | else: 606 | raise IOError("Feature file not found [%s]" % (item['file'])) 607 | 608 | # Accumulate statistics 609 | normalizer.accumulate(feature_data) 610 | 611 | # Calculate normalization factors 612 | normalizer.finalize() 613 | 614 | # Save 615 | save_data(current_normalizer_file, normalizer) 616 | 617 | 618 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, feature_params, classifier_params, 619 | dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False): 620 | """System training 621 | 622 | model container format: 623 | 624 | { 625 | 'normalizer': normalizer class 626 | 'models' : 627 | { 628 | 'office' : mixture.GMM class 629 | 'home' : mixture.GMM class 630 | ... 631 | } 632 | } 633 | 634 | Parameters 635 | ---------- 636 | dataset : class 637 | dataset class 638 | 639 | model_path : str 640 | path where the models are saved. 641 | 642 | feature_normalizer_path : str 643 | path where the feature normalizers are saved. 644 | 645 | feature_path : str 646 | path where the features are saved. 647 | 648 | feature_params : dict 649 | parameter dict 650 | 651 | classifier_params : dict 652 | parameter dict 653 | 654 | dataset_evaluation_mode : str ['folds', 'full'] 655 | evaluation mode, 'full' all material available is considered to belong to one fold. 656 | (Default value='folds') 657 | 658 | classifier_method : str ['gmm'] 659 | classifier method, currently only GMM supported 660 | (Default value='gmm') 661 | 662 | clean_audio_errors : bool 663 | Remove audio errors from the training data 664 | (Default value=False) 665 | 666 | overwrite : bool 667 | overwrite existing models 668 | (Default value=False) 669 | 670 | Returns 671 | ------- 672 | nothing 673 | 674 | Raises 675 | ------- 676 | ValueError 677 | classifier_method is unknown. 678 | 679 | IOError 680 | Feature normalizer not found. 681 | Feature file not found. 682 | 683 | """ 684 | 685 | if classifier_method != 'gmm': 686 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 687 | 688 | # Check that target path exists, create if not 689 | check_path(model_path) 690 | 691 | for fold in dataset.folds(mode=dataset_evaluation_mode): 692 | current_model_file = get_model_filename(fold=fold, path=model_path) 693 | if not os.path.isfile(current_model_file) or overwrite: 694 | # Load normalizer 695 | feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, path=feature_normalizer_path) 696 | if os.path.isfile(feature_normalizer_filename): 697 | normalizer = load_data(feature_normalizer_filename) 698 | else: 699 | raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename) 700 | 701 | # Initialize model container 702 | model_container = {'normalizer': normalizer, 'models': {}} 703 | 704 | # Collect training examples 705 | file_count = len(dataset.train(fold)) 706 | data = {} 707 | for item_id, item in enumerate(dataset.train(fold)): 708 | progress(title_text='Collecting data', 709 | fold=fold, 710 | percentage=(float(item_id) / file_count), 711 | note=os.path.split(item['file'])[1]) 712 | 713 | # Load features 714 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path) 715 | if os.path.isfile(feature_filename): 716 | feature_data = load_data(feature_filename)['feat'] 717 | else: 718 | raise IOError("Features not found [%s]" % (item['file'])) 719 | 720 | # Scale features 721 | feature_data = model_container['normalizer'].normalize(feature_data) 722 | 723 | # Audio error removal 724 | if clean_audio_errors: 725 | current_errors = dataset.file_error_meta(item['file']) 726 | if current_errors: 727 | removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool) 728 | for error_event in current_errors: 729 | onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds'])) 730 | offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds'])) 731 | if offset_frame > feature_data.shape[0]: 732 | offset_frame = feature_data.shape[0] 733 | removal_mask[onset_frame:offset_frame] = False 734 | feature_data = feature_data[removal_mask, :] 735 | 736 | # Store features per class label 737 | if item['scene_label'] not in data: 738 | data[item['scene_label']] = feature_data 739 | else: 740 | data[item['scene_label']] = numpy.vstack((data[item['scene_label']], feature_data)) 741 | 742 | # Train models for each class 743 | for label in data: 744 | progress(title_text='Train models', 745 | fold=fold, 746 | note=label) 747 | if classifier_method == 'gmm': 748 | model_container['models'][label] = mixture.GMM(**classifier_params).fit(data[label]) 749 | else: 750 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 751 | 752 | # Save models 753 | save_data(current_model_file, model_container) 754 | 755 | 756 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params, 757 | dataset_evaluation_mode='folds', classifier_method='gmm', clean_audio_errors=False, overwrite=False): 758 | """System testing. 759 | 760 | If extracted features are not found from disk, they are extracted but not saved. 761 | 762 | Parameters 763 | ---------- 764 | dataset : class 765 | dataset class 766 | 767 | result_path : str 768 | path where the results are saved. 769 | 770 | feature_path : str 771 | path where the features are saved. 772 | 773 | model_path : str 774 | path where the models are saved. 775 | 776 | feature_params : dict 777 | parameter dict 778 | 779 | dataset_evaluation_mode : str ['folds', 'full'] 780 | evaluation mode, 'full' all material available is considered to belong to one fold. 781 | (Default value='folds') 782 | 783 | classifier_method : str ['gmm'] 784 | classifier method, currently only GMM supported 785 | (Default value='gmm') 786 | 787 | clean_audio_errors : bool 788 | Remove audio errors from the training data 789 | (Default value=False) 790 | 791 | overwrite : bool 792 | overwrite existing models 793 | (Default value=False) 794 | 795 | Returns 796 | ------- 797 | nothing 798 | 799 | Raises 800 | ------- 801 | ValueError 802 | classifier_method is unknown. 803 | 804 | IOError 805 | Model file not found. 806 | Audio file not found. 807 | 808 | """ 809 | 810 | if classifier_method != 'gmm': 811 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 812 | 813 | # Check that target path exists, create if not 814 | check_path(result_path) 815 | 816 | for fold in dataset.folds(mode=dataset_evaluation_mode): 817 | current_result_file = get_result_filename(fold=fold, path=result_path) 818 | if not os.path.isfile(current_result_file) or overwrite: 819 | results = [] 820 | 821 | # Load class model container 822 | model_filename = get_model_filename(fold=fold, path=model_path) 823 | if os.path.isfile(model_filename): 824 | model_container = load_data(model_filename) 825 | else: 826 | raise IOError("Model file not found [%s]" % model_filename) 827 | 828 | file_count = len(dataset.test(fold)) 829 | for file_id, item in enumerate(dataset.test(fold)): 830 | progress(title_text='Testing', 831 | fold=fold, 832 | percentage=(float(file_id) / file_count), 833 | note=os.path.split(item['file'])[1]) 834 | 835 | # Load features 836 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path) 837 | 838 | if os.path.isfile(feature_filename): 839 | feature_data = load_data(feature_filename)['feat'] 840 | else: 841 | # Load audio 842 | if os.path.isfile(dataset.relative_to_absolute_path(item['file'])): 843 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(item['file']), mono=True, fs=feature_params['fs']) 844 | else: 845 | raise IOError("Audio file not found [%s]" % (item['file'])) 846 | 847 | feature_data = feature_extraction(y=y, 848 | fs=fs, 849 | include_mfcc0=feature_params['include_mfcc0'], 850 | include_delta=feature_params['include_delta'], 851 | include_acceleration=feature_params['include_acceleration'], 852 | mfcc_params=feature_params['mfcc'], 853 | delta_params=feature_params['mfcc_delta'], 854 | acceleration_params=feature_params['mfcc_acceleration'], 855 | statistics=False)['feat'] 856 | 857 | # Scale features 858 | feature_data = model_container['normalizer'].normalize(feature_data) 859 | 860 | if clean_audio_errors: 861 | current_errors = dataset.file_error_meta(item['file']) 862 | if current_errors: 863 | removal_mask = numpy.ones((feature_data.shape[0]), dtype=bool) 864 | for error_event in current_errors: 865 | onset_frame = int(numpy.floor(error_event['event_onset'] / feature_params['hop_length_seconds'])) 866 | offset_frame = int(numpy.ceil(error_event['event_offset'] / feature_params['hop_length_seconds'])) 867 | if offset_frame > feature_data.shape[0]: 868 | offset_frame = feature_data.shape[0] 869 | removal_mask[onset_frame:offset_frame] = False 870 | feature_data = feature_data[removal_mask, :] 871 | 872 | # Do classification for the block 873 | if classifier_method == 'gmm': 874 | current_result = do_classification_gmm(feature_data, model_container) 875 | else: 876 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 877 | 878 | # Store the result 879 | results.append((dataset.absolute_to_relative(item['file']), current_result)) 880 | 881 | # Save testing results 882 | with open(current_result_file, 'wt') as f: 883 | writer = csv.writer(f, delimiter='\t') 884 | for result_item in results: 885 | writer.writerow(result_item) 886 | 887 | 888 | def do_classification_gmm(feature_data, model_container): 889 | """GMM classification for give feature matrix 890 | 891 | model container format: 892 | 893 | { 894 | 'normalizer': normalizer class 895 | 'models' : 896 | { 897 | 'office' : mixture.GMM class 898 | 'home' : mixture.GMM class 899 | ... 900 | } 901 | } 902 | 903 | Parameters 904 | ---------- 905 | feature_data : numpy.ndarray [shape=(t, feature vector length)] 906 | feature matrix 907 | 908 | model_container : dict 909 | model container 910 | 911 | Returns 912 | ------- 913 | result : str 914 | classification result as scene label 915 | 916 | """ 917 | 918 | # Initialize log-likelihood matrix to -inf 919 | logls = numpy.empty(len(model_container['models'])) 920 | logls.fill(-numpy.inf) 921 | 922 | for label_id, label in enumerate(model_container['models']): 923 | logls[label_id] = numpy.sum(model_container['models'][label].score(feature_data)) 924 | 925 | classification_result_id = numpy.argmax(logls) 926 | return model_container['models'].keys()[classification_result_id] 927 | 928 | 929 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'): 930 | """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed. 931 | 932 | Parameters 933 | ---------- 934 | dataset : class 935 | dataset class 936 | 937 | result_path : str 938 | path where the results are saved. 939 | 940 | dataset_evaluation_mode : str ['folds', 'full'] 941 | evaluation mode, 'full' all material available is considered to belong to one fold. 942 | (Default value='folds') 943 | 944 | Returns 945 | ------- 946 | nothing 947 | 948 | Raises 949 | ------- 950 | IOError 951 | Result file not found 952 | 953 | """ 954 | 955 | dcase2016_scene_metric = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels) 956 | results_fold = [] 957 | for fold in dataset.folds(mode=dataset_evaluation_mode): 958 | dcase2016_scene_metric_fold = DCASE2016_SceneClassification_Metrics(class_list=dataset.scene_labels) 959 | results = [] 960 | result_filename = get_result_filename(fold=fold, path=result_path) 961 | 962 | if os.path.isfile(result_filename): 963 | with open(result_filename, 'rt') as f: 964 | for row in csv.reader(f, delimiter='\t'): 965 | results.append(row) 966 | else: 967 | raise IOError("Result file not found [%s]" % result_filename) 968 | 969 | y_true = [] 970 | y_pred = [] 971 | for result in results: 972 | y_true.append(dataset.file_meta(result[0])[0]['scene_label']) 973 | y_pred.append(result[1]) 974 | dcase2016_scene_metric.evaluate(system_output=y_pred, annotated_ground_truth=y_true) 975 | dcase2016_scene_metric_fold.evaluate(system_output=y_pred, annotated_ground_truth=y_true) 976 | results_fold.append(dcase2016_scene_metric_fold.results()) 977 | results = dcase2016_scene_metric.results() 978 | 979 | print " File-wise evaluation, over %d folds" % dataset.fold_count 980 | fold_labels = '' 981 | separator = ' =====================+======+======+==========+ +' 982 | if dataset.fold_count > 1: 983 | for fold in dataset.folds(mode=dataset_evaluation_mode): 984 | fold_labels += " {:8s} |".format('Fold'+str(fold)) 985 | separator += "==========+" 986 | print " {:20s} | {:4s} : {:4s} | {:8s} | |".format('Scene label', 'Nref', 'Nsys', 'Accuracy')+fold_labels 987 | print separator 988 | for label_id, label in enumerate(sorted(results['class_wise_accuracy'])): 989 | fold_values = '' 990 | if dataset.fold_count > 1: 991 | for fold in dataset.folds(mode=dataset_evaluation_mode): 992 | fold_values += " {:5.1f} % |".format(results_fold[fold-1]['class_wise_accuracy'][label] * 100) 993 | print " {:20s} | {:4d} : {:4d} | {:5.1f} % | |".format(label, 994 | results['class_wise_data'][label]['Nref'], 995 | results['class_wise_data'][label]['Nsys'], 996 | results['class_wise_accuracy'][label] * 100)+fold_values 997 | print separator 998 | fold_values = '' 999 | if dataset.fold_count > 1: 1000 | for fold in dataset.folds(mode=dataset_evaluation_mode): 1001 | fold_values += " {:5.1f} % |".format(results_fold[fold-1]['overall_accuracy'] * 100) 1002 | 1003 | print " {:20s} | {:4d} : {:4d} | {:5.1f} % | |".format('Overall accuracy', 1004 | results['Nref'], 1005 | results['Nsys'], 1006 | results['overall_accuracy'] * 100)+fold_values 1007 | 1008 | if __name__ == "__main__": 1009 | try: 1010 | sys.exit(main(sys.argv)) 1011 | except (ValueError, IOError) as e: 1012 | sys.exit(e) 1013 | -------------------------------------------------------------------------------- /task1_scene_classification.yaml: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Flow 3 | # ========================================================== 4 | flow: 5 | initialize: true 6 | extract_features: true 7 | feature_normalizer: true 8 | train_system: true 9 | test_system: true 10 | evaluate_system: true 11 | 12 | # ========================================================== 13 | # General 14 | # ========================================================== 15 | general: 16 | development_dataset: TUTAcousticScenes_2016_DevelopmentSet 17 | challenge_dataset: TUTAcousticScenes_2016_EvaluationSet 18 | 19 | overwrite: false # Overwrite previously stored data 20 | 21 | challenge_submission_mode: false # save results into path->challenge_results for challenge submission 22 | 23 | # ========================================================== 24 | # Paths 25 | # ========================================================== 26 | path: 27 | data: data/ 28 | 29 | base: system/baseline_dcase2016_task1/ 30 | features: features/ 31 | feature_normalizers: feature_normalizers/ 32 | models: acoustic_models/ 33 | results: evaluation_results/ 34 | 35 | challenge_results: challenge_submission/task_1_acoustic_scene_classification/ 36 | 37 | # ========================================================== 38 | # Feature extraction 39 | # ========================================================== 40 | features: 41 | fs: 44100 42 | win_length_seconds: 0.04 43 | hop_length_seconds: 0.02 44 | 45 | include_mfcc0: true # 46 | include_delta: true # 47 | include_acceleration: true # 48 | 49 | mfcc: 50 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric] 51 | n_mfcc: 20 # Number of MFCC coefficients 52 | n_mels: 40 # Number of MEL bands used 53 | n_fft: 2048 # FFT length 54 | fmin: 0 # Minimum frequency when constructing MEL bands 55 | fmax: 22050 # Maximum frequency when constructing MEL band 56 | htk: false # Switch for HTK-styled MEL-frequency equation 57 | 58 | mfcc_delta: 59 | width: 9 60 | 61 | mfcc_acceleration: 62 | width: 9 63 | 64 | # ========================================================== 65 | # Classifier 66 | # ========================================================== 67 | classifier: 68 | method: gmm # The system supports only gmm 69 | 70 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones) 71 | clean_data: false # Exclude audio errors from training audio 72 | 73 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method 74 | 75 | classifier_parameters: 76 | gmm: 77 | n_components: 16 # Number of Gaussian components 78 | covariance_type: diag # [diag|full] Diagonal or full covariance matrix 79 | random_state: 0 80 | thresh: !!null 81 | tol: 0.001 82 | min_covar: 0.001 83 | n_iter: 40 84 | n_init: 1 85 | params: wmc 86 | init_params: wmc 87 | 88 | # ========================================================== 89 | # Recognizer 90 | # ========================================================== 91 | recognizer: 92 | audio_error_handling: # Handling audio errors (temporary microphone failure and radio signal interferences from mobile phones) 93 | clean_data: false # Exclude audio errors from test audio -------------------------------------------------------------------------------- /task3_sound_event_detection_in_real_life_audio.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # 4 | # DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System 5 | 6 | from src.ui import * 7 | from src.general import * 8 | from src.files import * 9 | 10 | from src.features import * 11 | from src.sound_event_detection import * 12 | from src.dataset import * 13 | from src.evaluation import * 14 | 15 | import numpy 16 | import csv 17 | import warnings 18 | import argparse 19 | import textwrap 20 | import math 21 | 22 | from sklearn import mixture 23 | 24 | __version_info__ = ('1', '0', '1') 25 | __version__ = '.'.join(__version_info__) 26 | 27 | 28 | def main(argv): 29 | numpy.random.seed(123456) # let's make randomization predictable 30 | 31 | parser = argparse.ArgumentParser( 32 | prefix_chars='-+', 33 | formatter_class=argparse.RawDescriptionHelpFormatter, 34 | description=textwrap.dedent('''\ 35 | DCASE 2016 36 | Task 3: Sound Event Detection in Real-life Audio 37 | Baseline System 38 | --------------------------------------------- 39 | Tampere University of Technology / Audio Research Group 40 | Author: Toni Heittola ( toni.heittola@tut.fi ) 41 | 42 | System description 43 | This is an baseline implementation for the D-CASE 2016, task 3 - Sound event detection in real life audio. 44 | The system has binary classifier for each included sound event class. The GMM classifier is trained with 45 | the positive and negative examples from the mixture signals, and classification is done between these 46 | two models as likelihood ratio. Acoustic features are MFCC+Delta+Acceleration (MFCC0 omitted). 47 | 48 | ''')) 49 | 50 | parser.add_argument("-development", help="Use the system in the development mode", action='store_true', 51 | default=False, dest='development') 52 | parser.add_argument("-challenge", help="Use the system in the challenge mode", action='store_true', 53 | default=False, dest='challenge') 54 | 55 | parser.add_argument('-v', '--version', action='version', version='%(prog)s ' + __version__) 56 | args = parser.parse_args() 57 | 58 | # Load parameters from config file 59 | parameter_file = os.path.join(os.path.dirname(os.path.realpath(__file__)), 60 | os.path.splitext(os.path.basename(__file__))[0]+'.yaml') 61 | params = load_parameters(parameter_file) 62 | params = process_parameters(params) 63 | make_folders(params) 64 | 65 | title("DCASE 2016::Sound Event Detection in Real-life Audio / Baseline System") 66 | 67 | # Check if mode is defined 68 | if not (args.development or args.challenge): 69 | args.development = True 70 | args.challenge = False 71 | 72 | dataset_evaluation_mode = 'folds' 73 | if args.development and not args.challenge: 74 | print "Running system in development mode" 75 | dataset_evaluation_mode = 'folds' 76 | elif not args.development and args.challenge: 77 | print "Running system in challenge mode" 78 | dataset_evaluation_mode = 'full' 79 | 80 | # Get dataset container class 81 | dataset = eval(params['general']['development_dataset'])(data_path=params['path']['data']) 82 | 83 | # Fetch data over internet and setup the data 84 | # ================================================== 85 | if params['flow']['initialize']: 86 | dataset.fetch() 87 | 88 | # Extract features for all audio files in the dataset 89 | # ================================================== 90 | if params['flow']['extract_features']: 91 | section_header('Feature extraction [Development data]') 92 | 93 | # Collect files from evaluation sets 94 | files = [] 95 | for fold in dataset.folds(mode=dataset_evaluation_mode): 96 | for item_id, item in enumerate(dataset.train(fold)): 97 | if item['file'] not in files: 98 | files.append(item['file']) 99 | for item_id, item in enumerate(dataset.test(fold)): 100 | if item['file'] not in files: 101 | files.append(item['file']) 102 | 103 | # Go through files and make sure all features are extracted 104 | do_feature_extraction(files=files, 105 | dataset=dataset, 106 | feature_path=params['path']['features'], 107 | params=params['features'], 108 | overwrite=params['general']['overwrite']) 109 | 110 | foot() 111 | 112 | # Prepare feature normalizers 113 | # ================================================== 114 | if params['flow']['feature_normalizer']: 115 | section_header('Feature normalizer [Development data]') 116 | 117 | do_feature_normalization(dataset=dataset, 118 | feature_normalizer_path=params['path']['feature_normalizers'], 119 | feature_path=params['path']['features'], 120 | dataset_evaluation_mode=dataset_evaluation_mode, 121 | overwrite=params['general']['overwrite']) 122 | 123 | foot() 124 | 125 | # System training 126 | # ================================================== 127 | if params['flow']['train_system']: 128 | section_header('System training [Development data]') 129 | 130 | do_system_training(dataset=dataset, 131 | model_path=params['path']['models'], 132 | feature_normalizer_path=params['path']['feature_normalizers'], 133 | feature_path=params['path']['features'], 134 | hop_length_seconds=params['features']['hop_length_seconds'], 135 | classifier_params=params['classifier']['parameters'], 136 | dataset_evaluation_mode=dataset_evaluation_mode, 137 | classifier_method=params['classifier']['method'], 138 | overwrite=params['general']['overwrite'] 139 | ) 140 | 141 | foot() 142 | 143 | # System evaluation in development mode 144 | if args.development and not args.challenge: 145 | 146 | # System testing 147 | # ================================================== 148 | if params['flow']['test_system']: 149 | section_header('System testing [Development data]') 150 | 151 | do_system_testing(dataset=dataset, 152 | result_path=params['path']['results'], 153 | feature_path=params['path']['features'], 154 | model_path=params['path']['models'], 155 | feature_params=params['features'], 156 | detector_params=params['detector'], 157 | dataset_evaluation_mode=dataset_evaluation_mode, 158 | classifier_method=params['classifier']['method'], 159 | overwrite=params['general']['overwrite'] 160 | ) 161 | foot() 162 | 163 | # System evaluation 164 | # ================================================== 165 | if params['flow']['evaluate_system']: 166 | section_header('System evaluation [Development data]') 167 | 168 | do_system_evaluation(dataset=dataset, 169 | dataset_evaluation_mode=dataset_evaluation_mode, 170 | result_path=params['path']['results']) 171 | 172 | foot() 173 | 174 | # System evaluation with challenge data 175 | elif not args.development and args.challenge: 176 | # Fetch data over internet and setup the data 177 | challenge_dataset = eval(params['general']['challenge_dataset'])(data_path=params['path']['data']) 178 | 179 | if params['flow']['initialize']: 180 | challenge_dataset.fetch() 181 | 182 | # System testing 183 | if params['flow']['test_system']: 184 | section_header('System testing [Challenge data]') 185 | 186 | do_system_testing(dataset=challenge_dataset, 187 | result_path=params['path']['challenge_results'], 188 | feature_path=params['path']['features'], 189 | model_path=params['path']['models'], 190 | feature_params=params['features'], 191 | detector_params=params['detector'], 192 | dataset_evaluation_mode=dataset_evaluation_mode, 193 | classifier_method=params['classifier']['method'], 194 | overwrite=True 195 | ) 196 | foot() 197 | 198 | print " " 199 | print "Your results for the challenge data are stored at ["+params['path']['challenge_results']+"]" 200 | print " " 201 | 202 | 203 | def process_parameters(params): 204 | """Parameter post-processing. 205 | 206 | Parameters 207 | ---------- 208 | params : dict 209 | parameters in dict 210 | 211 | Returns 212 | ------- 213 | params : dict 214 | processed parameters 215 | 216 | """ 217 | 218 | params['features']['mfcc']['win_length'] = int(params['features']['win_length_seconds'] * params['features']['fs']) 219 | params['features']['mfcc']['hop_length'] = int(params['features']['hop_length_seconds'] * params['features']['fs']) 220 | 221 | # Copy parameters for current classifier method 222 | params['classifier']['parameters'] = params['classifier_parameters'][params['classifier']['method']] 223 | 224 | # Hash 225 | params['features']['hash'] = get_parameter_hash(params['features']) 226 | params['classifier']['hash'] = get_parameter_hash(params['classifier']) 227 | params['detector']['hash'] = get_parameter_hash(params['detector']) 228 | 229 | # Paths 230 | params['path']['data'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['data']) 231 | params['path']['base'] = os.path.join(os.path.dirname(os.path.realpath(__file__)), params['path']['base']) 232 | 233 | # Features 234 | params['path']['features_'] = params['path']['features'] 235 | params['path']['features'] = os.path.join(params['path']['base'], 236 | params['path']['features'], 237 | params['features']['hash']) 238 | 239 | # Feature normalizers 240 | params['path']['feature_normalizers_'] = params['path']['feature_normalizers'] 241 | params['path']['feature_normalizers'] = os.path.join(params['path']['base'], 242 | params['path']['feature_normalizers'], 243 | params['features']['hash']) 244 | 245 | # Models 246 | # Save parameters into folders to help manual browsing of files. 247 | params['path']['models_'] = params['path']['models'] 248 | params['path']['models'] = os.path.join(params['path']['base'], 249 | params['path']['models'], 250 | params['features']['hash'], 251 | params['classifier']['hash']) 252 | 253 | # Results 254 | params['path']['results_'] = params['path']['results'] 255 | params['path']['results'] = os.path.join(params['path']['base'], 256 | params['path']['results'], 257 | params['features']['hash'], 258 | params['classifier']['hash'], 259 | params['detector']['hash']) 260 | return params 261 | 262 | 263 | def make_folders(params, parameter_filename='parameters.yaml'): 264 | """Create all needed folders, and saves parameters in yaml-file for easier manual browsing of data. 265 | 266 | Parameters 267 | ---------- 268 | params : dict 269 | parameters in dict 270 | 271 | parameter_filename : str 272 | filename to save parameters used to generate the folder name 273 | 274 | Returns 275 | ------- 276 | nothing 277 | 278 | """ 279 | 280 | # Check that target path exists, create if not 281 | check_path(params['path']['features']) 282 | check_path(params['path']['feature_normalizers']) 283 | check_path(params['path']['models']) 284 | check_path(params['path']['results']) 285 | 286 | # Save parameters into folders to help manual browsing of files. 287 | 288 | # Features 289 | feature_parameter_filename = os.path.join(params['path']['features'], parameter_filename) 290 | if not os.path.isfile(feature_parameter_filename): 291 | save_parameters(feature_parameter_filename, params['features']) 292 | 293 | # Feature normalizers 294 | feature_normalizer_parameter_filename = os.path.join(params['path']['feature_normalizers'], parameter_filename) 295 | if not os.path.isfile(feature_normalizer_parameter_filename): 296 | save_parameters(feature_normalizer_parameter_filename, params['features']) 297 | 298 | # Models 299 | model_features_parameter_filename = os.path.join(params['path']['base'], 300 | params['path']['models_'], 301 | params['features']['hash'], 302 | parameter_filename) 303 | if not os.path.isfile(model_features_parameter_filename): 304 | save_parameters(model_features_parameter_filename, params['features']) 305 | 306 | model_models_parameter_filename = os.path.join(params['path']['base'], 307 | params['path']['models_'], 308 | params['features']['hash'], 309 | params['classifier']['hash'], 310 | parameter_filename) 311 | if not os.path.isfile(model_models_parameter_filename): 312 | save_parameters(model_models_parameter_filename, params['classifier']) 313 | 314 | # Results 315 | # Save parameters into folders to help manual browsing of files. 316 | result_features_parameter_filename = os.path.join(params['path']['base'], 317 | params['path']['results_'], 318 | params['features']['hash'], 319 | parameter_filename) 320 | if not os.path.isfile(result_features_parameter_filename): 321 | save_parameters(result_features_parameter_filename, params['features']) 322 | 323 | result_models_parameter_filename = os.path.join(params['path']['base'], 324 | params['path']['results_'], 325 | params['features']['hash'], 326 | params['classifier']['hash'], 327 | parameter_filename) 328 | if not os.path.isfile(result_models_parameter_filename): 329 | save_parameters(result_models_parameter_filename, params['classifier']) 330 | 331 | result_detector_parameter_filename = os.path.join(params['path']['base'], 332 | params['path']['results_'], 333 | params['features']['hash'], 334 | params['classifier']['hash'], 335 | params['detector']['hash'], 336 | parameter_filename) 337 | if not os.path.isfile(result_detector_parameter_filename): 338 | save_parameters(result_detector_parameter_filename, params['detector']) 339 | 340 | 341 | def get_feature_filename(audio_file, path, extension='cpickle'): 342 | """Get feature filename 343 | 344 | Parameters 345 | ---------- 346 | audio_file : str 347 | audio file name from which the features are extracted 348 | 349 | path : str 350 | feature path 351 | 352 | extension : str 353 | file extension 354 | (Default value='cpickle') 355 | 356 | Returns 357 | ------- 358 | feature_filename : str 359 | full feature filename 360 | 361 | """ 362 | 363 | return os.path.join(path, 'sequence_' + os.path.splitext(audio_file)[0] + '.' + extension) 364 | 365 | 366 | def get_feature_normalizer_filename(fold, scene_label, path, extension='cpickle'): 367 | """Get normalizer filename 368 | 369 | Parameters 370 | ---------- 371 | fold : int >= 0 372 | evaluation fold number 373 | 374 | scene_label : str 375 | scene label 376 | 377 | path : str 378 | normalizer path 379 | 380 | extension : str 381 | file extension 382 | (Default value='cpickle') 383 | 384 | Returns 385 | ------- 386 | normalizer_filename : str 387 | full normalizer filename 388 | 389 | """ 390 | 391 | return os.path.join(path, 'scale_fold' + str(fold) + '_' + str(scene_label) + '.' + extension) 392 | 393 | 394 | def get_model_filename(fold, scene_label, path, extension='cpickle'): 395 | """Get model filename 396 | 397 | Parameters 398 | ---------- 399 | fold : int >= 0 400 | evaluation fold number 401 | 402 | scene_label : str 403 | scene label 404 | 405 | path : str 406 | model path 407 | 408 | extension : str 409 | file extension 410 | (Default value='cpickle') 411 | 412 | Returns 413 | ------- 414 | model_filename : str 415 | full model filename 416 | 417 | """ 418 | 419 | return os.path.join(path, 'model_fold' + str(fold) + '_' + str(scene_label) + '.' + extension) 420 | 421 | 422 | def get_result_filename(fold, scene_label, path, extension='txt'): 423 | """Get result filename 424 | 425 | Parameters 426 | ---------- 427 | fold : int >= 0 428 | evaluation fold number 429 | 430 | scene_label : str 431 | scene label 432 | 433 | path : str 434 | result path 435 | 436 | extension : str 437 | file extension 438 | (Default value='cpickle') 439 | 440 | Returns 441 | ------- 442 | result_filename : str 443 | full result filename 444 | 445 | """ 446 | 447 | if fold == 0: 448 | return os.path.join(path, 'results_' + str(scene_label) + '.' + extension) 449 | else: 450 | return os.path.join(path, 'results_fold' + str(fold) + '_' + str(scene_label) + '.' + extension) 451 | 452 | 453 | def do_feature_extraction(files, dataset, feature_path, params, overwrite=False): 454 | """Feature extraction 455 | 456 | Parameters 457 | ---------- 458 | files : list 459 | file list 460 | 461 | dataset : class 462 | dataset class 463 | 464 | feature_path : str 465 | path where the features are saved 466 | 467 | params : dict 468 | parameter dict 469 | 470 | overwrite : bool 471 | overwrite existing feature files 472 | (Default value=False) 473 | 474 | Returns 475 | ------- 476 | nothing 477 | 478 | Raises 479 | ------- 480 | IOError 481 | Audio file not found. 482 | 483 | """ 484 | 485 | for file_id, audio_filename in enumerate(files): 486 | # Get feature filename 487 | current_feature_file = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path) 488 | 489 | progress(title_text='Extracting [sequences]', 490 | percentage=(float(file_id) / len(files)), 491 | note=os.path.split(audio_filename)[1]) 492 | 493 | if not os.path.isfile(current_feature_file) or overwrite: 494 | # Load audio 495 | if os.path.isfile(dataset.relative_to_absolute_path(audio_filename)): 496 | y, fs = load_audio(filename=dataset.relative_to_absolute_path(audio_filename), mono=True, fs=params['fs']) 497 | else: 498 | raise IOError("Audio file not found [%s]" % audio_filename) 499 | 500 | # Extract features 501 | feature_data = feature_extraction(y=y, 502 | fs=fs, 503 | include_mfcc0=params['include_mfcc0'], 504 | include_delta=params['include_delta'], 505 | include_acceleration=params['include_acceleration'], 506 | mfcc_params=params['mfcc'], 507 | delta_params=params['mfcc_delta'], 508 | acceleration_params=params['mfcc_acceleration']) 509 | # Save 510 | save_data(current_feature_file, feature_data) 511 | 512 | 513 | def do_feature_normalization(dataset, feature_normalizer_path, feature_path, dataset_evaluation_mode='folds', overwrite=False): 514 | """Feature normalization 515 | 516 | Calculated normalization factors for each evaluation fold based on the training material available. 517 | 518 | Parameters 519 | ---------- 520 | dataset : class 521 | dataset class 522 | 523 | feature_normalizer_path : str 524 | path where the feature normalizers are saved. 525 | 526 | feature_path : str 527 | path where the features are saved. 528 | 529 | dataset_evaluation_mode : str ['folds', 'full'] 530 | evaluation mode, 'full' all material available is considered to belong to one fold. 531 | (Default value='folds') 532 | 533 | overwrite : bool 534 | overwrite existing normalizers 535 | (Default value=False) 536 | 537 | Returns 538 | ------- 539 | nothing 540 | 541 | Raises 542 | ------- 543 | IOError 544 | Feature file not found. 545 | 546 | """ 547 | 548 | for fold in dataset.folds(mode=dataset_evaluation_mode): 549 | for scene_id, scene_label in enumerate(dataset.scene_labels): 550 | current_normalizer_file = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path) 551 | 552 | if not os.path.isfile(current_normalizer_file) or overwrite: 553 | # Collect sequence files from scene class 554 | files = [] 555 | for item_id, item in enumerate(dataset.train(fold, scene_label=scene_label)): 556 | if item['file'] not in files: 557 | files.append(item['file']) 558 | 559 | file_count = len(files) 560 | 561 | # Initialize statistics 562 | normalizer = FeatureNormalizer() 563 | 564 | for file_id, audio_filename in enumerate(files): 565 | progress(title_text='Collecting data', 566 | fold=fold, 567 | percentage=(float(file_id) / file_count), 568 | note=os.path.split(audio_filename)[1]) 569 | 570 | # Load features 571 | feature_filename = get_feature_filename(audio_file=os.path.split(audio_filename)[1], path=feature_path) 572 | if os.path.isfile(feature_filename): 573 | feature_data = load_data(feature_filename)['stat'] 574 | else: 575 | raise IOError("Feature file not found [%s]" % audio_filename) 576 | 577 | # Accumulate statistics 578 | normalizer.accumulate(feature_data) 579 | 580 | # Calculate normalization factors 581 | normalizer.finalize() 582 | 583 | # Save 584 | save_data(current_normalizer_file, normalizer) 585 | 586 | 587 | def do_system_training(dataset, model_path, feature_normalizer_path, feature_path, hop_length_seconds, classifier_params, 588 | dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False): 589 | """System training 590 | 591 | Train a model pair for each sound event class, one for activity and one for inactivity. 592 | 593 | model container format: 594 | 595 | { 596 | 'normalizer': normalizer class 597 | 'models' : 598 | { 599 | 'mouse click' : 600 | { 601 | 'positive': mixture.GMM class, 602 | 'negative': mixture.GMM class 603 | } 604 | 'keyboard typing' : 605 | { 606 | 'positive': mixture.GMM class, 607 | 'negative': mixture.GMM class 608 | } 609 | ... 610 | } 611 | } 612 | 613 | Parameters 614 | ---------- 615 | dataset : class 616 | dataset class 617 | 618 | model_path : str 619 | path where the models are saved. 620 | 621 | feature_normalizer_path : str 622 | path where the feature normalizers are saved. 623 | 624 | feature_path : str 625 | path where the features are saved. 626 | 627 | hop_length_seconds : float > 0 628 | feature frame hop length in seconds 629 | 630 | classifier_params : dict 631 | parameter dict 632 | 633 | dataset_evaluation_mode : str ['folds', 'full'] 634 | evaluation mode, 'full' all material available is considered to belong to one fold. 635 | (Default value='folds') 636 | 637 | classifier_method : str ['gmm'] 638 | classifier method, currently only GMM supported 639 | (Default value='gmm') 640 | 641 | overwrite : bool 642 | overwrite existing models 643 | (Default value=False) 644 | 645 | Returns 646 | ------- 647 | nothing 648 | 649 | Raises 650 | ------- 651 | ValueError 652 | classifier_method is unknown. 653 | 654 | IOError 655 | Feature normalizer not found. 656 | Feature file not found. 657 | 658 | """ 659 | 660 | if classifier_method != 'gmm': 661 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 662 | 663 | for fold in dataset.folds(mode=dataset_evaluation_mode): 664 | for scene_id, scene_label in enumerate(dataset.scene_labels): 665 | current_model_file = get_model_filename(fold=fold, scene_label=scene_label, path=model_path) 666 | if not os.path.isfile(current_model_file) or overwrite: 667 | 668 | # Load normalizer 669 | feature_normalizer_filename = get_feature_normalizer_filename(fold=fold, scene_label=scene_label, path=feature_normalizer_path) 670 | if os.path.isfile(feature_normalizer_filename): 671 | normalizer = load_data(feature_normalizer_filename) 672 | else: 673 | raise IOError("Feature normalizer not found [%s]" % feature_normalizer_filename) 674 | 675 | # Initialize model container 676 | model_container = {'normalizer': normalizer, 'models': {}} 677 | 678 | # Restructure training data in to structure[files][events] 679 | ann = {} 680 | for item_id, item in enumerate(dataset.train(fold=fold, scene_label=scene_label)): 681 | filename = os.path.split(item['file'])[1] 682 | if filename not in ann: 683 | ann[filename] = {} 684 | if item['event_label'] not in ann[filename]: 685 | ann[filename][item['event_label']] = [] 686 | ann[filename][item['event_label']].append((item['event_onset'], item['event_offset'])) 687 | 688 | # Collect training examples 689 | data_positive = {} 690 | data_negative = {} 691 | file_count = len(ann) 692 | for item_id, audio_filename in enumerate(ann): 693 | progress(title_text='Collecting data', 694 | fold=fold, 695 | percentage=(float(item_id) / file_count), 696 | note=scene_label+" / "+os.path.split(audio_filename)[1]) 697 | 698 | # Load features 699 | feature_filename = get_feature_filename(audio_file=audio_filename, path=feature_path) 700 | if os.path.isfile(feature_filename): 701 | feature_data = load_data(feature_filename)['feat'] 702 | else: 703 | raise IOError("Feature file not found [%s]" % feature_filename) 704 | 705 | # Normalize features 706 | feature_data = model_container['normalizer'].normalize(feature_data) 707 | 708 | for event_label in ann[audio_filename]: 709 | positive_mask = numpy.zeros((feature_data.shape[0]), dtype=bool) 710 | 711 | for event in ann[audio_filename][event_label]: 712 | start_frame = int(math.floor(event[0] / hop_length_seconds)) 713 | stop_frame = int(math.ceil(event[1] / hop_length_seconds)) 714 | 715 | if stop_frame > feature_data.shape[0]: 716 | stop_frame = feature_data.shape[0] 717 | 718 | positive_mask[start_frame:stop_frame] = True 719 | 720 | # Store positive examples 721 | if event_label not in data_positive: 722 | data_positive[event_label] = feature_data[positive_mask, :] 723 | else: 724 | data_positive[event_label] = numpy.vstack((data_positive[event_label], feature_data[positive_mask, :])) 725 | 726 | # Store negative examples 727 | if event_label not in data_negative: 728 | data_negative[event_label] = feature_data[~positive_mask, :] 729 | else: 730 | data_negative[event_label] = numpy.vstack((data_negative[event_label], feature_data[~positive_mask, :])) 731 | 732 | # Train models for each class 733 | for event_label in data_positive: 734 | progress(title_text='Train models', 735 | fold=fold, 736 | note=scene_label+" / "+event_label) 737 | if classifier_method == 'gmm': 738 | model_container['models'][event_label] = {} 739 | model_container['models'][event_label]['positive'] = mixture.GMM(**classifier_params).fit(data_positive[event_label]) 740 | model_container['models'][event_label]['negative'] = mixture.GMM(**classifier_params).fit(data_negative[event_label]) 741 | else: 742 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 743 | 744 | # Save models 745 | save_data(current_model_file, model_container) 746 | 747 | 748 | def do_system_testing(dataset, result_path, feature_path, model_path, feature_params, detector_params, 749 | dataset_evaluation_mode='folds', classifier_method='gmm', overwrite=False): 750 | """System testing. 751 | 752 | If extracted features are not found from disk, they are extracted but not saved. 753 | 754 | Parameters 755 | ---------- 756 | dataset : class 757 | dataset class 758 | 759 | result_path : str 760 | path where the results are saved. 761 | 762 | feature_path : str 763 | path where the features are saved. 764 | 765 | model_path : str 766 | path where the models are saved. 767 | 768 | feature_params : dict 769 | parameter dict 770 | 771 | dataset_evaluation_mode : str ['folds', 'full'] 772 | evaluation mode, 'full' all material available is considered to belong to one fold. 773 | (Default value='folds') 774 | 775 | classifier_method : str ['gmm'] 776 | classifier method, currently only GMM supported 777 | (Default value='gmm') 778 | 779 | overwrite : bool 780 | overwrite existing models 781 | (Default value=False) 782 | 783 | Returns 784 | ------- 785 | nothing 786 | 787 | Raises 788 | ------- 789 | ValueError 790 | classifier_method is unknown. 791 | 792 | IOError 793 | Model file not found. 794 | Audio file not found. 795 | 796 | """ 797 | 798 | if classifier_method != 'gmm': 799 | raise ValueError("Unknown classifier method ["+classifier_method+"]") 800 | 801 | # Check that target path exists, create if not 802 | check_path(result_path) 803 | 804 | for fold in dataset.folds(mode=dataset_evaluation_mode): 805 | for scene_id, scene_label in enumerate(dataset.scene_labels): 806 | current_result_file = get_result_filename(fold=fold, scene_label=scene_label, path=result_path) 807 | 808 | if not os.path.isfile(current_result_file) or overwrite: 809 | results = [] 810 | 811 | # Load class model container 812 | model_filename = get_model_filename(fold=fold, scene_label=scene_label, path=model_path) 813 | if os.path.isfile(model_filename): 814 | model_container = load_data(model_filename) 815 | else: 816 | raise IOError("Model file not found [%s]" % model_filename) 817 | 818 | file_count = len(dataset.test(fold, scene_label=scene_label)) 819 | for file_id, item in enumerate(dataset.test(fold=fold, scene_label=scene_label)): 820 | progress(title_text='Testing', 821 | fold=fold, 822 | percentage=(float(file_id) / file_count), 823 | note=scene_label+" / "+os.path.split(item['file'])[1]) 824 | 825 | # Load features 826 | feature_filename = get_feature_filename(audio_file=item['file'], path=feature_path) 827 | 828 | if os.path.isfile(feature_filename): 829 | feature_data = load_data(feature_filename)['feat'] 830 | else: 831 | # Load audio 832 | if os.path.isfile(dataset.relative_to_absolute_path(item['file'])): 833 | y, fs = load_audio(filename=item['file'], mono=True, fs=feature_params['fs']) 834 | else: 835 | raise IOError("Audio file not found [%s]" % item['file']) 836 | 837 | # Extract features 838 | feature_data = feature_extraction(y=y, 839 | fs=fs, 840 | include_mfcc0=feature_params['include_mfcc0'], 841 | include_delta=feature_params['include_delta'], 842 | include_acceleration=feature_params['include_acceleration'], 843 | mfcc_params=feature_params['mfcc'], 844 | delta_params=feature_params['mfcc_delta'], 845 | acceleration_params=feature_params['mfcc_acceleration'], 846 | statistics=False)['feat'] 847 | 848 | # Normalize features 849 | feature_data = model_container['normalizer'].normalize(feature_data) 850 | 851 | current_results = event_detection(feature_data=feature_data, 852 | model_container=model_container, 853 | hop_length_seconds=feature_params['hop_length_seconds'], 854 | smoothing_window_length_seconds=detector_params['smoothing_window_length'], 855 | decision_threshold=detector_params['decision_threshold'], 856 | minimum_event_length=detector_params['minimum_event_length'], 857 | minimum_event_gap=detector_params['minimum_event_gap']) 858 | 859 | # Store the result 860 | for event in current_results: 861 | results.append((dataset.absolute_to_relative(item['file']), event[0], event[1], event[2] )) 862 | 863 | # Save testing results 864 | with open(current_result_file, 'wt') as f: 865 | writer = csv.writer(f, delimiter='\t') 866 | for result_item in results: 867 | writer.writerow(result_item) 868 | 869 | 870 | def do_system_evaluation(dataset, result_path, dataset_evaluation_mode='folds'): 871 | """System evaluation. Testing outputs are collected and evaluated. Evaluation results are printed. 872 | 873 | Parameters 874 | ---------- 875 | dataset : class 876 | dataset class 877 | 878 | result_path : str 879 | path where the results are saved. 880 | 881 | dataset_evaluation_mode : str ['folds', 'full'] 882 | evaluation mode, 'full' all material available is considered to belong to one fold. 883 | (Default value='folds') 884 | 885 | Returns 886 | ------- 887 | nothing 888 | 889 | Raises 890 | ------- 891 | IOError 892 | Result file not found 893 | 894 | """ 895 | 896 | # Set warnings off, sklearn metrics will trigger warning for classes without 897 | # predicted samples in F1-scoring. This is just to keep printing clean. 898 | warnings.simplefilter("ignore") 899 | 900 | overall_metrics_per_scene = {} 901 | 902 | for scene_id, scene_label in enumerate(dataset.scene_labels): 903 | if scene_label not in overall_metrics_per_scene: 904 | overall_metrics_per_scene[scene_label] = {} 905 | 906 | dcase2016_segment_based_metric = DCASE2016_EventDetection_SegmentBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label)) 907 | dcase2016_event_based_metric = DCASE2016_EventDetection_EventBasedMetrics(class_list=dataset.event_labels(scene_label=scene_label), use_onset_condition=True, use_offset_condition=False) 908 | 909 | for fold in dataset.folds(mode=dataset_evaluation_mode): 910 | results = [] 911 | result_filename = get_result_filename(fold=fold, scene_label=scene_label, path=result_path) 912 | 913 | if os.path.isfile(result_filename): 914 | with open(result_filename, 'rt') as f: 915 | for row in csv.reader(f, delimiter='\t'): 916 | results.append(row) 917 | else: 918 | raise IOError("Result file not found [%s]" % result_filename) 919 | 920 | for file_id, item in enumerate(dataset.test(fold, scene_label=scene_label)): 921 | current_file_results = [] 922 | for result_line in results: 923 | if len(result_line) != 0 and result_line[0] == dataset.absolute_to_relative(item['file']): 924 | current_file_results.append( 925 | {'file': result_line[0], 926 | 'event_onset': float(result_line[1]), 927 | 'event_offset': float(result_line[2]), 928 | 'event_label': result_line[3].rstrip() 929 | } 930 | ) 931 | meta = dataset.file_meta(dataset.absolute_to_relative(item['file'])) 932 | 933 | dcase2016_segment_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta) 934 | dcase2016_event_based_metric.evaluate(system_output=current_file_results, annotated_ground_truth=meta) 935 | 936 | overall_metrics_per_scene[scene_label]['segment_based_metrics'] = dcase2016_segment_based_metric.results() 937 | overall_metrics_per_scene[scene_label]['event_based_metrics'] = dcase2016_event_based_metric.results() 938 | 939 | print " Evaluation over %d folds" % dataset.fold_count 940 | print " " 941 | print " Results per scene " 942 | print " {:18s} | {:5s} | | {:39s} ".format('', 'Main', 'Secondary metrics') 943 | print " {:18s} | {:5s} | | {:38s} | {:14s} | {:14s} | {:14s} ".format('', '', 'Seg/Overall','Seg/Class', 'Event/Overall','Event/Class') 944 | print " {:18s} | {:5s} | | {:6s} : {:5s} : {:5s} : {:5s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} | {:6s} : {:5s} |".format('Scene', 'ER', 'F1', 'ER', 'ER/S', 'ER/D', 'ER/I', 'F1', 'ER', 'F1', 'ER', 'F1', 'ER') 945 | print " -------------------+-------+ +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+" 946 | averages = { 947 | 'segment_based_metrics': { 948 | 'overall': { 949 | 'ER': [], 950 | 'F': [], 951 | }, 952 | 'class_wise_average': { 953 | 'ER': [], 954 | 'F': [], 955 | } 956 | }, 957 | 'event_based_metrics': { 958 | 'overall': { 959 | 'ER': [], 960 | 'F': [], 961 | }, 962 | 'class_wise_average': { 963 | 'ER': [], 964 | 'F': [], 965 | } 966 | }, 967 | } 968 | for scene_id, scene_label in enumerate(dataset.scene_labels): 969 | print " {:18s} | {:5.2f} | | {:4.1f} % : {:5.2f} : {:5.2f} : {:5.2f} : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format(scene_label, 970 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'], 971 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F'] * 100, 972 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER'], 973 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['S'], 974 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['D'], 975 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['I'], 976 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100, 977 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'], 978 | overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F']*100, 979 | overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER'], 980 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100, 981 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER'], 982 | ) 983 | averages['segment_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['ER']) 984 | averages['segment_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['overall']['F']) 985 | averages['segment_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER']) 986 | averages['segment_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']) 987 | averages['event_based_metrics']['overall']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['ER']) 988 | averages['event_based_metrics']['overall']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['overall']['F']) 989 | averages['event_based_metrics']['class_wise_average']['ER'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER']) 990 | averages['event_based_metrics']['class_wise_average']['F'].append(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']) 991 | 992 | print " -------------------+-------+ +--------+-------+-------+-------+-------+--------+-------+--------+-------+--------+-------+" 993 | print " {:18s} | {:5.2f} | | {:4.1f} % : {:5.2f} : {:21s} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} | {:4.1f} % : {:5.2f} |".format('Average', 994 | numpy.mean(averages['segment_based_metrics']['overall']['ER']), 995 | numpy.mean(averages['segment_based_metrics']['overall']['F'])*100, 996 | numpy.mean(averages['segment_based_metrics']['overall']['ER']), 997 | ' ', 998 | numpy.mean(averages['segment_based_metrics']['class_wise_average']['F'])*100, 999 | numpy.mean(averages['segment_based_metrics']['class_wise_average']['ER']), 1000 | numpy.mean(averages['event_based_metrics']['overall']['F'])*100, 1001 | numpy.mean(averages['event_based_metrics']['overall']['ER']), 1002 | numpy.mean(averages['event_based_metrics']['class_wise_average']['F'])*100, 1003 | numpy.mean(averages['event_based_metrics']['class_wise_average']['ER']), 1004 | ) 1005 | 1006 | print " " 1007 | # Restore warnings to default settings 1008 | warnings.simplefilter("default") 1009 | print " Results per events " 1010 | 1011 | for scene_id, scene_label in enumerate(dataset.scene_labels): 1012 | print " " 1013 | print " "+scene_label.upper() 1014 | print " {:20s} | {:30s} | | {:15s} ".format('', 'Segment-based', 'Event-based') 1015 | print " {:20s} | {:5s} : {:5s} : {:6s} : {:5s} | | {:5s} : {:5s} : {:6s} : {:5s} |".format('Event', 'Nref', 'Nsys', 'F1', 'ER', 'Nref', 'Nsys', 'F1', 'ER') 1016 | print " ---------------------+-------+-------+--------+-------+ +-------+-------+--------+-------+" 1017 | seg_Nref = 0 1018 | seg_Nsys = 0 1019 | 1020 | event_Nref = 0 1021 | event_Nsys = 0 1022 | for event_label in sorted(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise']): 1023 | print " {:20s} | {:5d} : {:5d} : {:4.1f} % : {:5.2f} | | {:5d} : {:5d} : {:4.1f} % : {:5.2f} |".format(event_label, 1024 | int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref']), 1025 | int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys']), 1026 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['F']*100, 1027 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['ER'], 1028 | int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref']), 1029 | int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys']), 1030 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['F']*100, 1031 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['ER']) 1032 | seg_Nref += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nref']) 1033 | seg_Nsys += int(overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise'][event_label]['Nsys']) 1034 | 1035 | event_Nref += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nref']) 1036 | event_Nsys += int(overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise'][event_label]['Nsys']) 1037 | print " ---------------------+-------+-------+--------+-------+ +-------+-------+--------+-------+" 1038 | print " {:20s} | {:5d} : {:5d} : {:14s} | | {:5d} : {:5d} : {:14s} |".format('Sum', 1039 | seg_Nref, 1040 | seg_Nsys, 1041 | '', 1042 | event_Nref, 1043 | event_Nsys, 1044 | '') 1045 | print " {:20s} | {:5s} {:5s} : {:4.1f} % : {:5.2f} | | {:5s} {:5s} : {:4.1f} % : {:5.2f} |".format('Average', 1046 | '', '', 1047 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['F']*100, 1048 | overall_metrics_per_scene[scene_label]['segment_based_metrics']['class_wise_average']['ER'], 1049 | '', '', 1050 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['F']*100, 1051 | overall_metrics_per_scene[scene_label]['event_based_metrics']['class_wise_average']['ER']) 1052 | print " " 1053 | 1054 | if __name__ == "__main__": 1055 | try: 1056 | sys.exit(main(sys.argv)) 1057 | except (ValueError, IOError) as e: 1058 | sys.exit(e) -------------------------------------------------------------------------------- /task3_sound_event_detection_in_real_life_audio.yaml: -------------------------------------------------------------------------------- 1 | # ========================================================== 2 | # Flow 3 | # ========================================================== 4 | flow: 5 | initialize: true 6 | extract_features: true 7 | feature_normalizer: true 8 | train_system: true 9 | test_system: true 10 | evaluate_system: true 11 | 12 | # ========================================================== 13 | # General 14 | # ========================================================== 15 | general: 16 | development_dataset: TUTSoundEvents_2016_DevelopmentSet 17 | challenge_dataset: TUTSoundEvents_2016_EvaluationSet 18 | 19 | overwrite: false # Overwrite previously stored data 20 | 21 | # ========================================================== 22 | # Paths 23 | # ========================================================== 24 | path: 25 | data: data/ 26 | 27 | base: system/baseline_dcase2016_task3/ 28 | features: features/ 29 | feature_normalizers: feature_normalizers/ 30 | models: acoustic_models/ 31 | results: evaluation_results/ 32 | 33 | challenge_results: challenge_submission/task_3_sound_event_detection_in_real_life_audio/ 34 | 35 | # ========================================================== 36 | # Feature extraction 37 | # ========================================================== 38 | features: 39 | fs: 44100 40 | win_length_seconds: 0.04 41 | hop_length_seconds: 0.02 42 | 43 | include_mfcc0: false 44 | include_delta: true 45 | include_acceleration: true 46 | 47 | mfcc: 48 | window: hamming_asymmetric # [hann_asymmetric, hamming_asymmetric] 49 | n_mfcc: 20 # Number of MFCC coefficients 50 | n_mels: 40 # Number of MEL bands used 51 | n_fft: 2048 # FFT length, make sure this is larger than win_length_seconds*fs 52 | fmin: 0 # Minimum frequency when constructing MEL bands 53 | fmax: 22050 # Maximum frequency when constructing MEL band 54 | htk: false # Switch for HTK-styled MEL-frequency equation 55 | 56 | mfcc_delta: 57 | width: 9 58 | 59 | mfcc_acceleration: 60 | width: 9 61 | 62 | # ========================================================== 63 | # Classifier 64 | # ========================================================== 65 | classifier: 66 | method: gmm # The system supports only gmm 67 | parameters: !!null # Parameters are copied from classifier_parameters based on defined method 68 | 69 | classifier_parameters: 70 | gmm: 71 | n_components: 16 # Number of Gaussian components 72 | covariance_type: diag # [diag|full] Diagonal or full covariance matrix 73 | random_state: 0 74 | thresh: !!null 75 | tol: 0.001 76 | min_covar: 0.001 77 | n_iter: 40 78 | n_init: 1 79 | params: wmc 80 | init_params: wmc 81 | 82 | # ========================================================== 83 | # Detector 84 | # ========================================================== 85 | detector: 86 | decision_threshold: 160.0 87 | smoothing_window_length: 1.0 # seconds 88 | minimum_event_length: 0.1 # seconds 89 | minimum_event_gap: 0.1 # seconds 90 | --------------------------------------------------------------------------------