├── .gitignore
├── LICENSE.md
├── README.md
├── batch_feature_extraction.py
├── calculate_dev_results_from_dcase_output.py
├── cls_data_generator.py
├── cls_feature_class.py
├── display_specs.py
├── fold1_room1_mix050_ov2.png
├── gammatone
    ├── COPYING
    ├── README.md
    ├── auditory_toolkit
    │   ├── COPYING
    │   ├── ERBFilterBank.m
    │   ├── ERBSpace.m
    │   ├── MakeERBFilters.m
    │   ├── README
    │   ├── demo_gammatone.m
    │   ├── fft2gammatonemx.m
    │   ├── gammatone_demo.m
    │   ├── gammatonegram.m
    │   └── specgram.m
    ├── doc
    │   ├── FurElise.png
    │   ├── Makefile
    │   ├── conf.py
    │   ├── details.rst
    │   ├── fftweight.rst
    │   ├── filters.rst
    │   ├── gtgram.rst
    │   ├── index.rst
    │   ├── make.bat
    │   └── plot.rst
    ├── gammatone
    │   ├── __init__.py
    │   ├── __main__.py
    │   ├── fftweight.py
    │   ├── filters.py
    │   ├── gtgram.py
    │   └── plot.py
    ├── setup.py
    ├── test_generation
    │   ├── README
    │   ├── test_ERBFilterBank.m
    │   ├── test_ERBSpace.m
    │   ├── test_MakeERBFilters.m
    │   ├── test_fft2gammatonemx.m
    │   ├── test_fft_gammatonegram.m
    │   ├── test_gammatonegram.m
    │   └── test_specgram.m
    └── tests
    │   ├── __init__.py
    │   ├── data
    │       ├── test_erb_filter_data.mat
    │       ├── test_erbspace_data.mat
    │       ├── test_fft2gtmx_data.mat
    │       ├── test_fft_gammatonegram_data.mat
    │       ├── test_filterbank_data.mat
    │       ├── test_gammatonegram_data.mat
    │       └── test_specgram_data.mat
    │   ├── test_cfs.py
    │   ├── test_erb_space.py
    │   ├── test_fft_gtgram.py
    │   ├── test_fft_weights.py
    │   ├── test_filterbank.py
    │   ├── test_gammatone_filters.py
    │   ├── test_gammatonegram.py
    │   └── test_specgram.py
├── images
    ├── CRNN_SELDT_DCASE2020.png
    ├── SELDnet_output.jpg
    ├── scse_cropped.pdf
    ├── seld-squeeze-structure.pdf
    └── seld_squeeze_structure_image.jpg
├── keras_model.py
├── metrics
    ├── LICENSE.md
    ├── SELD_evaluation_metrics.py
    └── evaluation_metrics.py
├── parameter.py
├── seld.py
└── visualize_SELD_output.py


/.gitignore:
--------------------------------------------------------------------------------
1 | /home/jose/DCASE2020_Task3/base_folder/*
2 | /home/jose/DCASE2020_Task3/input_feature/*
3 | base_folder/*
4 | input_feature/*
5 | .vscode/*
6 | __pycache__/*
7 | metrics/__pycache__/*
8 | gammatone/gammatone/__pycache__/*
9 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
 2 | Copyright (c) 2020 Tampere University and its licensors
 3 | All rights reserved.
 4 | 
 5 | Permission is hereby granted, without written agreement and without
 6 | license or royalty fees, to use and copy the code for the Sound Event
 7 | Localization and Detection using Convolutional Recurrent Neural Network
 8 | method/architecture, present in the GitHub repository with the handle
 9 | seld-dcase2020, (“Work”) described in the paper with title "Sound event
10 | localization and detection of overlapping sources using 
11 | convolutional recurrent neural network" and composed of files with
12 | code in the Python programming language. This grant is only for experimental and
13 | non-commercial purposes, provided that the copyright notice in its entirety
14 | appear in all copies of this Work, and the original source of this Work,
15 | Audio Research Group at Tampere University, is acknowledged in any publication
16 | that reports research using this Work.
17 | 
18 | Any commercial use of the Work or any part thereof is strictly prohibited.
19 | Commercial use include, but is not limited to:
20 | - selling or reproducing the Work
21 | - selling or distributing the results or content achieved by use of the Work
22 | - providing services by using the Work.
23 | 
24 | IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO
25 | ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
26 | ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE
27 | UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY
28 | OF SUCH DAMAGE.
29 | 
30 | TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS
31 | ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
32 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER
33 | IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION
34 | TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
35 | 
36 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
37 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # DCASE 2020: SELD using squeeze-excitation residual networks
  3 | [Please visit the official webpage of the DCASE 2020 Challenge for comparison with other submissions](http://dcase.community/challenge2020/task-sound-event-localization-and-detection-results). 
  4 |    
  5 | The main objective of this submission was to study how squeeze-excitation techniques can improve the behavior of sound event detection and localization (SELD) systems. To do so, we start from the network presented as a baseline consisting of a CRNN and replace the convolutional layers by Conv-StandardPOST blocks. This block was presented in:
  6 | 
  7 | > Naranjo-Alcazar, J., Perez-Castanos, S., Zuccarello, P., & Cobos, M. (2020). Acoustic Scene Classification with Squeeze-Excitation Residual Networks. IEEE Access.
  8 | 
  9 | This repo implementation is presented in:
 10 | 
 11 | > Naranjo-Alcazar, Javier, et al. "Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs." arXiv preprint arXiv:2006.14436 (2020).
 12 | 
 13 | Please consider citing these works if the code or something presented in them has been used.
 14 | 
 15 | ## BASELINE METHOD
 16 | 
 17 | In comparison to the SELDnet studied in the papers above, we have changed the following to improve its performance and evaluate the performance better.
 18 |  * **Features**: The original SELDnet employed naive phase and magnitude components of the spectrogram as the input feature for all input formats of audio. In this baseline method, we use separate features for first-order Ambisonic (FOA) and microphone array (MIC) datasets. As the interaural level difference feature, we employ the 64-band mel energies extracted from each channel of the input audio for both FOA and MIC. To encode the interaural time difference features, we employ intensity vector features for FOA, and generalized cross correlation features for MIC. 
 19 |  * **Loss/Objective**: The original SELDnet employed mean square error (MSE) for the DOA loss estimation, and this was computed irrespecitve of the presence or absence of the sound event. In the current baseline, we used a masked-MSE, which computes MSE only when the sound event is active in the reference.
 20 |  * **Evaluation metrics**: The performance of the original SELDnet was evaluated with stand-alone metrics for detection, and localization. Mainly because there was no suitable metric which could jointly evaluate the performance of localization and detection. Since then, we have proposed a new metric that can jointly evaluate the performance (more about it is described in the metrics section below), and we employ this new metric for evaluation here.   
 21 |  
 22 | The final SELDnet architecture is as shown below. The input is the multichannel audio, from which the different acoustic features are extracted based on the input format of the audio. Based on the chosen dataset (FOA or MIC), the baseline method takes a sequence of consecutive feature-frames and predicts all the active sound event classes for each of the input frame along with their respective spatial location, producing the temporal activity and DOA trajectory for each sound event class. In particular, a convolutional recurrent neural network (CRNN) is used to map the frame sequence to the two outputs in parallel. At the first output, SED is performed as a multi-label multi-class classification task, allowing the network to simultaneously estimate the presence of multiple sound events for each frame. At the second output, DOA estimates in the continuous 3D space are obtained as a multi-output regression task, where each sound event class is associated with three regressors that estimate the Cartesian coordinates x, y and z axes of the DOA on a unit sphere around the microphone.
 23 | 
 24 | <p align="center">
 25 |    <img src="https://github.com/sharathadavanne/seld-dcase2020/blob/master/images/CRNN_SELDT_DCASE2020.png" width="400" title="SELDnet Architecture">
 26 | </p>
 27 | 
 28 | The SED output of the network is in the continuous range of [0 1] for each sound event in the dataset, and this value is thresholded to obtain a binary decision for the respective sound event activity. Finally, the respective DOA estimates for these active sound event classes provide their spatial locations.
 29 | 
 30 | ## SUBMISSION MODIFICATION
 31 | 
 32 | This image shows the submission architecture:
 33 | 
 34 | <p align="center">
 35 |    <img src="images/seld_squeeze_structure_image.jpg" width="400" height="400">
 36 | </p>
 37 | 
 38 | <!--![seldnet_squeeze_excitation](images/seld-squeeze-structure_image.jpg =250x) -->
 39 | 
 40 | ## DATASET
 41 | 
 42 | The dataset used has been:
 43 | 
 44 |  * **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array**
 45 | 
 46 | **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array** provides four-channel directional microphone recordings from a tetrahedral array configuration. This format is extracted from the same microphone array, and additional information on the spatial characteristics of each format can be found below. This dataset consists of a development and evaluation set. The development set consists of 600, one minute long recordings sampled at 24000 Hz. We use 400 recordings for training split (fold 3 to 6), 100 for validation (fold 2) and 100 for testing (fold 1). The evaluation set consists of 200, one-minute recordings, and will be released at a later point. 
 47 | 
 48 | More details on the recording procedure and dataset can be read on the [DCASE 2020 task webpage](http://dcase.community/challenge2020/task-sound-event-localization-and-detection).
 49 | 
 50 | The two development datasets can be downloaded from the link - [**TAU-NIGENS Spatial Sound Events 2020 - Ambisonic and Microphone Array**, Development dataset](https://doi.org/10.5281/zenodo.3740236) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3740236.svg)](https://doi.org/10.5281/zenodo.3740236) 
 51 | 
 52 | 
 53 | ## Getting Started
 54 | 
 55 | This repository consists of multiple Python scripts forming one big architecture used to train the SELDnet.
 56 | * The `batch_feature_extraction.py` is a standalone wrapper script, that extracts the features, labels, and normalizes the training and test split features for a given dataset. Make sure you update the location of the downloaded datasets before.
 57 | * The `parameter.py` script consists of all the training, model, and feature parameters. If a user has to change some parameters, they have to create a sub-task with unique id here. Check code for examples.
 58 | * The `cls_feature_class.py` script has routines for labels creation, features extraction and normalization.
 59 | * The `cls_data_generator.py` script provides feature + label data in generator mode for training.
 60 | * The `keras_model.py` script implements the SELDnet architecture.
 61 | * The `evaluation_metrics.py` script implements the core metrics from sound event detection evaluation module http://tut-arg.github.io/sed_eval/ and the DOA metrics explained in the paper. These were used in the DCASE 2019 SELD task. We use this here to just for legacy comparison
 62 | * The `SELD_evaluation_metrics.py` script implements the metrics for joint evaluation of detection and localization.
 63 | * The `seld.py` is a wrapper script that trains the SELDnet. The training stops when the SELD error (check paper) stops improving.
 64 | 
 65 | Additionally, we also provide supporting scripts that help analyse the results.
 66 |  * `visualize_SELD_output.py` script to visualize the SELDnet output
 67 |  
 68 | 
 69 | ### Prerequisites
 70 | 
 71 | The provided codebase has been tested on python 3.6.9/3.7.3 and Keras 2.2.4/2.3.1
 72 | 
 73 | 
 74 | ### Training the SELDnet
 75 | 
 76 | In order to quickly train SELDnet follow the steps below.
 77 | 
 78 | * For the chosen dataset (Ambisonic or Microphone), download the respective zip file. This contains both the audio files and the respective metadata. Unzip the files under the same 'base_folder/', ie, if you are Ambisonic dataset, then the 'base_folder/' should have two folders - 'foa_dev/' and 'metadata_dev/' after unzipping.
 79 | 
 80 | * Now update the respective dataset name and its path in `parameter.py` script. For the above example, you will change `dataset='foa'` and `dataset_dir='base_folder/'`. Also provide a directory path `feat_label_dir` in the same `parameter.py` script where all the features and labels will be dumped. 
 81 | 
 82 | * Extract features from the downloaded dataset by running the `batch_feature_extraction.py` script. Run the script as shown below. This will dump the normalized features and labels in the `feat_label_dir` folder.
 83 | 
 84 | ```
 85 | python3 batch_feature_extraction.py
 86 | ```
 87 | 
 88 | You can now train the SELDnet using this subimssion modifications. Parameters that MUST be indicated are --baseline and --ratio
 89 | ```python
 90 | python3 seld.py --baseline False --ratio 4
 91 | ```
 92 | 
 93 | executes ConvStandard modules with ratio =4. If you want to execute the baseline code, set --baseline to True. If want to execute residual learning without squeeze-excitation:
 94 | 
 95 | ```python
 96 | python3 seld.py --baseline False --ratio 0
 97 | ```
 98 | 
 99 | 
100 | * By default, the code runs in `quick_test = False` mode. Setting `quick_test = True` in `parameter.py` trains the network for 2 epochs on only 2 mini-batches.
101 | 
102 | * The code also plots training curves, intermediate results and saves models in the `model_dir` path provided by the user in `parameter.py` file.
103 | 
104 | * In order to visualize the output of SELDnet and for submission of results, set `dcase_output=True` and provide `dcase_dir` directory. This will dump file-wise results in the directory, which can be individually visualized using `visualize_SELD_output.py` script.
105 | 
106 | ## Results on development dataset (baseline)
107 | 
108 | As the evaluation metrics we use two different approaches as discussed in our recent paper below
109 | 
110 | > Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, and Tuomas Virtanen. Joint measurement of localization and detection of sound events. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, Oct 2019.
111 | 
112 | The first metric is more focused on the detection part, also referred as the location-aware detection, which gives us the error rate (ER) and F-score (F) in one-second non-overlapping segments. We consider the prediction to be correct if the prediction and reference class are the same, and the distance between them is below 20&deg;.
113 | The second metric is more focused on the localization part, also referred as the class-aware localization, which gives us the DOA error (DE), and F-score (DE_F) in one-second non-overlapping segments. Unlike the location-aware detection, we do not use any distance threshold, but estimate the distance between the correct prediction and reference.
114 | 
115 | The evaluation metric scores for the test split of the development dataset is given below    
116 | 
117 | ### Baseline results
118 | 
119 | | Dataset | ER | F | DE | DE_F |
120 | | ----| --- | --- | --- | --- |
121 | | Microphone Array (MIC) | 0.78 | 31.4 % | 27.3&deg; | 59.0 % |
122 | 
123 | 
124 | **Note:** The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able to obtain very similar results.
125 | 
126 | ## Submission results
127 | 
128 | ### Development stage (results on testing folder)
129 | 
130 | set mode to dev
131 | 
132 | | ratio | ER | F | DE | DE_F |
133 | | ----| --- | --- | --- | --- |
134 | | 0 | 0.68 | 42.3 | 22.5 | 65.1 |
135 | | 1 | 0.70 | 39.2 | 23.5 | 63.6 |
136 | | 2 | 0.69 | 40.4 | 23.2 | 62.1 |
137 | | 4 | 0.68 | 40.9 | 23.3 | 65.0 |
138 | | 8 | 0.69 | 40.8 | 23.5 | 63.8 |
139 | | 16 | 0.69 | 40.7 | 23.3 | 62.8
140 | 
141 | ### Challenge results 
142 | 
143 | * The team submission ranked 11/15
144 | 
145 | * Best system ranked, ratio = 1, 30/43
146 | 
147 | | ratio | ER | F | DE | DE_F |
148 | | :----:| --- | --- | --- | --- |
149 | | organization baseline | 0.70 | 39.5 | 23.2 | 62.1 |
150 | | 0 | 0.61 | 48.3 | 19.2 | 65.9 |
151 | | 1 | 0.61 | 49.1 | 19.5 | 67.1 |
152 | | 8 | 0.64 | 46.7 | 20.0 | 64.5 |
153 | | 16 | 0.63 | 47.3 | 19.5 | 65.5 |
154 | 
155 | 
156 | 
157 | 


--------------------------------------------------------------------------------
/batch_feature_extraction.py:
--------------------------------------------------------------------------------
 1 | # Extracts the features, labels, and normalizes the development and evaluation split features.
 2 | 
 3 | import cls_feature_class
 4 | import parameter
 5 | 
 6 | process_str = 'eval' #, eval'   # 'dev' or 'eval' will extract features for the respective set accordingly
 7 |                             #  'dev, eval' will extract features of both sets together
 8 | 
 9 | params = parameter.get_params()
10 | 
11 | 
12 | if 'dev' in process_str:
13 |     # -------------- Extract features and labels for development set -----------------------------
14 |     dev_feat_cls = cls_feature_class.FeatureClass(params, is_eval=False)
15 | 
16 |     # Extract features and normalize them
17 |     dev_feat_cls.extract_all_feature()
18 |     dev_feat_cls.preprocess_features()
19 | 
20 |     # # Extract labels in regression mode
21 |     dev_feat_cls.extract_all_labels()
22 | 
23 | 
24 | if 'eval' in process_str:
25 |     # -----------------------------Extract ONLY features for evaluation set-----------------------------
26 |     eval_feat_cls = cls_feature_class.FeatureClass(params, is_eval=True)
27 | 
28 |     # Extract features and normalize them
29 |     eval_feat_cls.extract_all_feature()
30 |     eval_feat_cls.preprocess_features()
31 | 
32 | 


--------------------------------------------------------------------------------
/calculate_dev_results_from_dcase_output.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from metrics import SELD_evaluation_metrics
 3 | import cls_feature_class
 4 | import parameter
 5 | import numpy as np
 6 | 
 7 | 
 8 | def get_nb_files(_pred_file_list, _group='split'):
 9 |     _group_ind = {'ir': 4, 'ov': 21}
10 |     _cnt_dict = {}
11 |     for _filename in _pred_file_list:
12 | 
13 |         if _group == 'all':
14 |             _ind = 0
15 |         else:
16 |             _ind = int(_filename[_group_ind[_group]])
17 | 
18 |         if _ind not in _cnt_dict:
19 |             _cnt_dict[_ind] = []
20 |         _cnt_dict[_ind].append(_filename)
21 | 
22 |     return _cnt_dict
23 | 
24 | 
25 | # --------------------------- MAIN SCRIPT STARTS HERE -------------------------------------------
26 | 
27 | 
28 | # INPUT DIRECTORY
29 | ref_desc_files = '/scratch/asignal/sharath/DCASE2020_SELD_dataset/metadata_dev' # reference description directory location
30 | pred_output_format_files = 'results/2_mic_dev' # predicted output format directory location
31 | use_polar_format = True # Compute SELD metrics using polar or Cartesian coordinates
32 | 
33 | # Load feature class
34 | params = parameter.get_params()
35 | feat_cls = cls_feature_class.FeatureClass(params)
36 | 
37 | # collect reference files info
38 | ref_files = os.listdir(ref_desc_files)
39 | nb_ref_files = len(ref_files)
40 | 
41 | # collect predicted files info
42 | pred_files = os.listdir(pred_output_format_files)
43 | nb_pred_files = len(pred_files)
44 | 
45 | # Calculate scores for different splits, overlapping sound events, and impulse responses (reverberant scenes)
46 | score_type_list = ['all', 'ov', 'ir']
47 | print('Number of predicted files: {}\nNumber of reference files: {}'.format(nb_pred_files, nb_ref_files))
48 | print('\nCalculating {} scores for {}'.format(score_type_list, os.path.basename(pred_output_format_files)))
49 | 
50 | for score_type in score_type_list:
51 |     print('\n\n---------------------------------------------------------------------------------------------------')
52 |     print('------------------------------------  {}   ---------------------------------------------'.format('Total score' if score_type=='all' else 'score per {}'.format(score_type)))
53 |     print('---------------------------------------------------------------------------------------------------')
54 | 
55 |     split_cnt_dict = get_nb_files(pred_files, _group=score_type) # collect files corresponding to score_type
56 |     # Calculate scores across files for a given score_type
57 |     for split_key in np.sort(list(split_cnt_dict)):
58 |         # Load evaluation metric class
59 |         eval = SELD_evaluation_metrics.SELDMetrics(nb_classes=feat_cls.get_nb_classes(), doa_threshold=params['lad_doa_thresh'])
60 |         for pred_cnt, pred_file in enumerate(split_cnt_dict[split_key]):
61 |             # Load predicted output format file
62 |             pred_dict = feat_cls.load_output_format_file(os.path.join(pred_output_format_files, pred_file))
63 |             if use_polar_format:
64 |                 pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict)
65 |                 pred_labels = feat_cls.segment_labels(pred_dict_polar, feat_cls.get_nb_frames())
66 |             else:
67 |                 pred_labels = feat_cls.segment_labels(pred_dict, feat_cls.get_nb_frames())
68 | 
69 |             # Load reference description file
70 |             gt_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_desc_files, pred_file.replace('.npy', '.csv')))
71 |             if use_polar_format:
72 |                 gt_labels = feat_cls.segment_labels(gt_dict_polar, feat_cls.get_nb_frames())
73 |             else:
74 |                 gt_dict = feat_cls.convert_output_format_polar_to_cartesian(gt_dict_polar)
75 |                 gt_labels = feat_cls.segment_labels(gt_dict, feat_cls.get_nb_frames())
76 | 
77 |             # Calculated scores
78 |             if use_polar_format:
79 |                 eval.update_seld_scores(pred_labels, gt_labels)
80 |             else:
81 |                 eval.update_seld_scores_xyz(pred_labels, gt_labels)
82 | 
83 | 
84 |         # Overall SED and DOA scores
85 |         er, f, de, de_f = eval.compute_seld_scores()
86 |         seld_scr = SELD_evaluation_metrics.early_stopping_metric([er, f], [de, de_f])
87 | 
88 |         print('\nAverage score for {} {} data using {} coordinates'.format(score_type, 'fold' if score_type=='all' else split_key, 'Polar' if use_polar_format else 'Cartesian' ))
89 |         print('SELD score (early stopping metric): {:0.2f}'.format(seld_scr))
90 |         print('SED metrics: Error rate: {:0.2f}, F-score:{:0.1f}'.format(er, 100*f))
91 |         print('DOA metrics: DOA error: {:0.1f}, F-score:{:0.1f}'.format(de, 100*de_f))
92 | 


--------------------------------------------------------------------------------
/cls_data_generator.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # Data generator for training the SELDnet
  3 | #
  4 | 
  5 | import os
  6 | import numpy as np
  7 | import cls_feature_class
  8 | from IPython import embed
  9 | from collections import deque
 10 | import random
 11 | 
 12 | 
 13 | class DataGenerator(object):
 14 |     def __init__(
 15 |             self, params, split=1, shuffle=True, per_file=False, is_eval=False
 16 |     ):
 17 |         self._per_file = per_file
 18 |         self._is_eval = is_eval
 19 |         self._splits = np.array(split)
 20 |         self._batch_size = params['batch_size']
 21 |         self._feature_seq_len = params['feature_sequence_length']
 22 |         self._label_seq_len = params['label_sequence_length']
 23 |         self._shuffle = shuffle
 24 |         self._feat_cls = cls_feature_class.FeatureClass(params=params, is_eval=self._is_eval)
 25 |         self._label_dir = self._feat_cls.get_label_dir()
 26 |         self._feat_dir = self._feat_cls.get_normalized_feat_dir()
 27 | 
 28 |         self._filenames_list = list()
 29 |         self._nb_frames_file = 0     # Using a fixed number of frames in feat files. Updated in _get_label_filenames_sizes()
 30 |         self._nb_mel_bins = self._feat_cls.get_nb_mel_bins()
 31 |         self._nb_ch = None
 32 |         self._label_len = None  # total length of label - DOA + SED
 33 |         self._doa_len = None    # DOA label length
 34 |         self._class_dict = self._feat_cls.get_classes()
 35 |         self._nb_classes = self._feat_cls.get_nb_classes()
 36 |         self._get_filenames_list_and_feat_label_sizes()
 37 | 
 38 |         self._feature_batch_seq_len = self._batch_size*self._feature_seq_len
 39 |         self._label_batch_seq_len = self._batch_size*self._label_seq_len
 40 |         self._circ_buf_feat = None
 41 |         self._circ_buf_label = None
 42 | 
 43 |         if self._per_file:
 44 |             self._nb_total_batches = len(self._filenames_list)
 45 |         else:
 46 |             self._nb_total_batches = int(np.floor((len(self._filenames_list) * self._nb_frames_file /
 47 |                                                float(self._feature_batch_seq_len))))
 48 | 
 49 |         # self._dummy_feat_vec = np.ones(self._feat_len.shape) *
 50 | 
 51 |         print(
 52 |             '\tDatagen_mode: {}, nb_files: {}, nb_classes:{}\n'
 53 |             '\tnb_frames_file: {}, feat_len: {}, nb_ch: {}, label_len:{}\n'.format(
 54 |                 'eval' if self._is_eval else 'dev', len(self._filenames_list),  self._nb_classes,
 55 |                 self._nb_frames_file, self._nb_mel_bins, self._nb_ch, self._label_len
 56 |                 )
 57 |         )
 58 | 
 59 |         print(
 60 |             '\tDataset: {}, split: {}\n'
 61 |             '\tbatch_size: {}, feat_seq_len: {}, label_seq_len: {}, shuffle: {}\n'
 62 |             '\tTotal batches in dataset: {}\n'
 63 |             '\tlabel_dir: {}\n '
 64 |             '\tfeat_dir: {}\n'.format(
 65 |                 params['dataset'], split,
 66 |                 self._batch_size, self._feature_seq_len, self._label_seq_len, self._shuffle,
 67 |                 self._nb_total_batches,
 68 |                 self._label_dir, self._feat_dir
 69 |             )
 70 |         )
 71 | 
 72 |     def get_data_sizes(self):
 73 |         feat_shape = (self._batch_size, self._nb_ch, self._feature_seq_len, self._nb_mel_bins)
 74 |         if self._is_eval:
 75 |             label_shape = None
 76 |         else:
 77 |             label_shape = [
 78 |                 (self._batch_size, self._label_seq_len, self._nb_classes),
 79 |                 (self._batch_size, self._label_seq_len, self._nb_classes*3)
 80 |             ]
 81 |         return feat_shape, label_shape
 82 | 
 83 |     def get_total_batches_in_data(self):
 84 |         return self._nb_total_batches
 85 | 
 86 |     def _get_filenames_list_and_feat_label_sizes(self):
 87 |         
 88 |         for filename in os.listdir(self._feat_dir):
 89 |             if self._is_eval:
 90 |                 self._filenames_list.append(filename)
 91 |             else:
 92 |                 if int(filename[4]) in self._splits: # check which split the file belongs to
 93 |                     self._filenames_list.append(filename)
 94 | 
 95 |         temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[0]))
 96 |         self._nb_frames_file = temp_feat.shape[0]
 97 |         self._nb_ch = temp_feat.shape[1] // self._nb_mel_bins
 98 | 
 99 |         if not self._is_eval:
100 |             temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[0]))
101 |             self._label_len = temp_label.shape[-1]
102 |             self._doa_len = (self._label_len - self._nb_classes)//self._nb_classes
103 | 
104 |         if self._per_file:
105 |             self._batch_size = int(np.ceil(temp_feat.shape[0]/float(self._feature_seq_len)))
106 | 
107 |         return
108 | 
109 |     def generate(self):
110 |         """
111 |         Generates batches of samples
112 |         :return: 
113 |         """
114 | 
115 |         while 1:
116 |             if self._shuffle:
117 |                 random.shuffle(self._filenames_list)
118 | 
119 |             # Ideally this should have been outside the while loop. But while generating the test data we want the data
120 |             # to be the same exactly for all epoch's hence we keep it here.
121 |             self._circ_buf_feat = deque()
122 |             self._circ_buf_label = deque()
123 | 
124 |             file_cnt = 0
125 |             if self._is_eval:
126 |                 for i in range(self._nb_total_batches):
127 |                     # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the
128 |                     # circular buffer. If not keep refilling it.
129 |                     while len(self._circ_buf_feat) < self._feature_batch_seq_len:
130 |                         temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt]))
131 | 
132 |                         for row_cnt, row in enumerate(temp_feat):
133 |                             self._circ_buf_feat.append(row)
134 | 
135 |                         # If self._per_file is True, this returns the sequences belonging to a single audio recording
136 |                         if self._per_file:
137 |                             extra_frames = self._feature_batch_seq_len - temp_feat.shape[0]
138 |                             extra_feat = np.ones((extra_frames, temp_feat.shape[1])) * 1e-6
139 | 
140 |                             for row_cnt, row in enumerate(extra_feat):
141 |                                 self._circ_buf_feat.append(row)
142 | 
143 |                         file_cnt = file_cnt + 1
144 | 
145 |                     # Read one batch size from the circular buffer
146 |                     feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch))
147 |                     for j in range(self._feature_batch_seq_len):
148 |                         feat[j, :] = self._circ_buf_feat.popleft()
149 |                     feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch))
150 | 
151 |                     # Split to sequences
152 |                     feat = self._split_in_seqs(feat, self._feature_seq_len)
153 |                     feat = np.transpose(feat, (0, 3, 1, 2))
154 | 
155 |                     yield feat
156 | 
157 |             else:
158 |                 for i in range(self._nb_total_batches):
159 | 
160 |                     # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the
161 |                     # circular buffer. If not keep refilling it.
162 |                     while len(self._circ_buf_feat) < self._feature_batch_seq_len:
163 |                         temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt]))
164 |                         temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[file_cnt]))
165 | 
166 |                         for f_row in temp_feat:
167 |                             self._circ_buf_feat.append(f_row)
168 |                         for l_row in temp_label:
169 |                             self._circ_buf_label.append(l_row)
170 | 
171 |                         # If self._per_file is True, this returns the sequences belonging to a single audio recording
172 |                         if self._per_file:
173 |                             feat_extra_frames = self._feature_batch_seq_len - temp_feat.shape[0]
174 |                             extra_feat = np.ones((feat_extra_frames, temp_feat.shape[1])) * 1e-6
175 | 
176 |                             label_extra_frames = self._label_batch_seq_len - temp_label.shape[0]
177 |                             extra_labels = np.zeros((label_extra_frames, temp_label.shape[1]))
178 | 
179 |                             for f_row in extra_feat:
180 |                                 self._circ_buf_feat.append(f_row)
181 |                             for l_row in extra_labels:
182 |                                 self._circ_buf_label.append(l_row)
183 | 
184 |                         file_cnt = file_cnt + 1
185 | 
186 |                     # Read one batch size from the circular buffer
187 |                     feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch))
188 |                     label = np.zeros((self._label_batch_seq_len, self._label_len))
189 |                     for j in range(self._feature_batch_seq_len):
190 |                         feat[j, :] = self._circ_buf_feat.popleft()
191 |                     for j in range(self._label_batch_seq_len):
192 |                         label[j, :] = self._circ_buf_label.popleft()
193 |                     feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch))
194 | 
195 |                     # Split to sequences
196 |                     feat = self._split_in_seqs(feat, self._feature_seq_len)
197 |                     feat = np.transpose(feat, (0, 3, 1, 2))
198 |                     label = self._split_in_seqs(label, self._label_seq_len)
199 | 
200 |                     label = [
201 |                         label[:, :, :self._nb_classes],  # SED labels
202 |                         label # SED + DOA labels
203 |                          ]
204 |                     yield feat, label
205 | 
206 |     def _split_in_seqs(self, data, _seq_len):
207 |         if len(data.shape) == 1:
208 |             if data.shape[0] % _seq_len:
209 |                 data = data[:-(data.shape[0] % _seq_len), :]
210 |             data = data.reshape((data.shape[0] // _seq_len, _seq_len, 1))
211 |         elif len(data.shape) == 2:
212 |             if data.shape[0] % _seq_len:
213 |                 data = data[:-(data.shape[0] % _seq_len), :]
214 |             data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1]))
215 |         elif len(data.shape) == 3:
216 |             if data.shape[0] % _seq_len:
217 |                 data = data[:-(data.shape[0] % _seq_len), :, :]
218 |             data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1], data.shape[2]))
219 |         else:
220 |             print('ERROR: Unknown data dimensions: {}'.format(data.shape))
221 |             exit()
222 |         return data
223 | 
224 |     @staticmethod
225 |     def split_multi_channels(data, num_channels):
226 |         tmp = None
227 |         in_shape = data.shape
228 |         if len(in_shape) == 3:
229 |             hop = in_shape[2] / num_channels
230 |             tmp = np.zeros((in_shape[0], num_channels, in_shape[1], hop))
231 |             for i in range(num_channels):
232 |                 tmp[:, i, :, :] = data[:, :, i * hop:(i + 1) * hop]
233 |         elif len(in_shape) == 4 and num_channels == 1:
234 |             tmp = np.zeros((in_shape[0], 1, in_shape[1], in_shape[2], in_shape[3]))
235 |             tmp[:, 0, :, :, :] = data
236 |         else:
237 |             print('ERROR: The input should be a 3D matrix but it seems to have dimensions: {}'.format(in_shape))
238 |             exit()
239 |         return tmp
240 | 
241 |     def get_default_elevation(self):
242 |         return self._default_ele
243 | 
244 |     def get_azi_ele_list(self):
245 |         return self._feat_cls.get_azi_ele_list()
246 | 
247 |     def get_nb_classes(self):
248 |         return self._nb_classes
249 | 
250 |     def nb_frames_1s(self):
251 |         return self._feat_cls.nb_frames_1s()
252 | 
253 |     def get_hop_len_sec(self):
254 |         return self._feat_cls.get_hop_len_sec()
255 | 
256 |     def get_classes(self):
257 |         return self._feat_cls.get_classes()
258 |     
259 |     def get_filelist(self):
260 |         return self._filenames_list
261 | 
262 |     def get_frame_per_file(self):
263 |         return self._label_batch_seq_len
264 | 
265 |     def get_nb_frames(self):
266 |         return self._feat_cls.get_nb_frames()
267 |     
268 |     def get_data_gen_mode(self):
269 |         return self._is_eval
270 | 
271 |     def write_output_format_file(self, _out_file, _out_dict):
272 |         return self._feat_cls.write_output_format_file(_out_file, _out_dict)


--------------------------------------------------------------------------------
/display_specs.py:
--------------------------------------------------------------------------------
 1 | import librosa.display
 2 | import numpy as np
 3 | import matplotlib.pyplot as plt
 4 | 
 5 | 
 6 | 
 7 | gamma = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy')
 8 | 
 9 | gamma_ch1 = gamma[:,0:64]
10 | 
11 | plt.subplot(2, 2, 1)
12 | gamma_ch1 = gamma_ch1.T
13 | librosa.display.specshow(np.flip(gamma_ch1,1))
14 | plt.colorbar()
15 | plt.title('fold1_room1_mix001_ov1 gammatone scale to max')
16 | 
17 | gamma_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy')
18 | 
19 | gamma_norm_ch1 = gamma_norm[:,0:64]
20 | #gamma_norm_ch1 = gamma_norm_ch1.T
21 | plt.subplot(2, 2, 2)
22 | librosa.display.specshow(gamma_norm_ch1.T)
23 | plt.colorbar()
24 | plt.title('fold1_room1_mix001_ov1 gammatone no scale to max')
25 | 
26 | spec = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev/fold1_room1_mix001_ov1.npy')
27 | 
28 | spec_ch1 = spec[:,0:64]
29 | 
30 | plt.subplot(2, 2, 3)
31 | #spec_ch1 = spec_ch1.T
32 | librosa.display.specshow(spec_ch1.T)
33 | plt.colorbar()
34 | plt.title('fold1_room1_mix001_ov1 mel spectrogram')
35 | 
36 | spec_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev_norm/fold1_room1_mix001_ov1.npy')
37 | 
38 | spec_norm_ch1 = spec_norm[:,0:64]
39 | 
40 | plt.subplot(2, 2, 4)
41 | librosa.display.specshow(spec_norm_ch1.T)
42 | plt.colorbar()
43 | plt.title('fold1_room1_mix001_ov1 mel norm spectrogram')
44 | 
45 | plt.show()


--------------------------------------------------------------------------------
/fold1_room1_mix050_ov2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/fold1_room1_mix050_ov2.png


--------------------------------------------------------------------------------
/gammatone/COPYING:
--------------------------------------------------------------------------------
 1 | Copyright (c) 1998, Malcolm Slaney <malcolm@interval.com>
 2 | Copyright (c) 2009, Dan Ellis <dpwe@ee.columbia.edu>
 3 | Copyright (c) 2014, Jason Heeris <jason.heeris@gmail.com>
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 |     * Redistributions of source code must retain the above copyright
 9 |       notice, this list of conditions and the following disclaimer.
10 |     * Redistributions in binary form must reproduce the above copyright
11 |       notice, this list of conditions and the following disclaimer in the
12 |       documentation and/or other materials provided with the distribution.
13 |     * Neither the name of the copyright holder nor the names of its contributors
14 |       may be used to endorse or promote products derived from this software
15 |       without specific prior written permission.
16 | 
17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
18 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
19 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
20 | DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
21 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
22 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
23 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
24 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 | 


--------------------------------------------------------------------------------
/gammatone/README.md:
--------------------------------------------------------------------------------
  1 | Gammatone Filterbank Toolkit
  2 | ============================
  3 | 
  4 | *Utilities for analysing sound using perceptual models of human hearing.*
  5 | 
  6 | Jason Heeris, 2013
  7 | 
  8 | Summary
  9 | -------
 10 | 
 11 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank MATLAB
 12 | code, detailed below, to Python 2 and 3 using Numpy and Scipy. It analyses signals by
 13 | running them through banks of gammatone filters, similar to Fourier-based
 14 | spectrogram analysis.
 15 | 
 16 | ![Gammatone-based spectrogram of Für Elise](doc/FurElise.png)
 17 | 
 18 | Installation
 19 | ------------
 20 | 
 21 | You can install directly from this git repository using:
 22 | 
 23 | ```text
 24 | pip install git+https://github.com/detly/gammatone.git
 25 | ```
 26 | 
 27 | ...or you can clone the git repository however you prefer, and do:
 28 | 
 29 | ```text
 30 | pip install .
 31 | ```
 32 | 
 33 | ...or:
 34 | 
 35 | ```
 36 | python setup.py install
 37 | ```
 38 | 
 39 | ...from the cloned tree.
 40 | 
 41 | ### Dependencies
 42 | 
 43 |  - numpy
 44 |  - scipy
 45 |  - nose
 46 |  - mock
 47 |  - matplotlib
 48 | 
 49 | Using the Code
 50 | --------------
 51 | 
 52 | See the [API documentation](http://detly.github.io/gammatone/). For a
 53 | demonstration, find a `.wav` file (for example,
 54 | [Für Elise](http://heeris.id.au/samples/FurElise.wav)) and run:
 55 | 
 56 | ```text
 57 | python -m gammatone FurElise.wav -d 10
 58 | ```
 59 | 
 60 | ...to see a gammatone-gram of the first ten seconds of the track. If you've
 61 | installed via `pip` or `setup.py install`, you should also be able to just run:
 62 | 
 63 | ```text
 64 | gammatone FurElise.wav -d 10
 65 | ```
 66 | 
 67 | Basis
 68 | -----
 69 | 
 70 | This project is based on research into how humans perceive audio, originally
 71 | published by Malcolm Slaney:
 72 | 
 73 | [Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report #1998-010,
 74 | Interval Research Corporation, 1998.](
 75 | http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
 76 | )
 77 | 
 78 | Slaney's report describes a way of modelling how the human ear perceives,
 79 | emphasises and separates different frequencies of sound. A series of gammatone
 80 | filters are constructed whose width increases with increasing centre frequency,
 81 | and this bank of filters is applied to a time-domain signal. The result of this
 82 | is a spectrum that should represent the human experience of sound better than,
 83 | say, a Fourier-domain spectrum would.
 84 | 
 85 | A gammatone filter has an impulse response that is a sine wave multiplied by a
 86 | gamma distribution function. It is a common approach to modelling the auditory
 87 | system.
 88 | 
 89 | The gammatone filterbank approach can be considered analogous (but not
 90 | equivalent) to a discrete Fourier transform where the frequency axis is
 91 | logarithmic. For example, a series of notes spaced an octave apart would appear
 92 | to be roughly linearly spaced; or a sound that was distributed across the same
 93 | linear frequency range would appear to have more spread at lower frequencies.
 94 | 
 95 | The real goal of this toolkit is to allow easy computation of the gammatone
 96 | equivalent of a spectrogram — a time-varying spectrum of energy over audible
 97 | frequencies based on a gammatone filterbank.
 98 | 
 99 | Slaney demonstrated his research with an initial implementation in MATLAB. This
100 | implementation was later extended by Dan Ellis, who found a way to approximate a
101 | "gammatone-gram" by using the fast Fourier transform. Ellis' code calculates a
102 | matrix of weights that can be applied to the output of a FFT so that a
103 | Fourier-based spectrogram can easily be transformed into such an approximation.
104 | 
105 | Ellis' code and documentation is here: [Gammatone-like spectrograms](
106 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/
107 | )
108 | 
109 | Interest
110 | --------
111 | 
112 | I became interested in this because of my background in science communication
113 | and my general interest in the teaching of signal processing. I find that the
114 | spectrogram approach to visualising signals is adequate for illustrating
115 | abstract systems or the mathematical properties of transforms, but bears little
116 | correspondence to a person's own experience of sound. If someone wants to see
117 | what their favourite piece of music "looks like," a normal Fourier transform
118 | based spectrogram is actually quite a poor way to visualise it. Features of the
119 | audio seem to be oddly spaced or unnaturally emphasised or de-emphasised
120 | depending on where they are in the frequency domain.
121 | 
122 | The gammatone filterbank approach seems to be closer to what someone might
123 | intuitively expect a visualisation of sound to look like, and can help develop
124 | an intuition about alternative representations of signals.
125 | 
126 | Verifying the port
127 | ------------------
128 | 
129 | Since this is a port of existing MATLAB code, I've written tests to verify the
130 | Python implementation against the original code. These tests aren't unit tests,
131 | but they do generally test single functions. Running the tests has the same
132 | workflow:
133 | 
134 |   1. Run the scripts in the `test_generation` directory. This will create a
135 |      `.mat` file containing test data in `tests/data`.
136 | 
137 |   2. Run `nosetest3` in the top level directory. This will find and run all the
138 |      tests in the `tests` directory.
139 | 
140 | Although I'm usually loathe to check in generated files to version control, I'm
141 | willing to make an exception for the `.mat` files containing the test data. My
142 | reasoning is that they represent the decoupling of my code from the MATLAB code,
143 | and if the two projects were separated, they would be considered a part of the
144 | Python code, not the original MATLAB code.
145 | 
146 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/COPYING:
--------------------------------------------------------------------------------
 1 | Copyright (c) 1998, Malcolm Slaney <malcolm@interval.com>
 2 | Copyright (c) 2009, Dan Ellis <dpwe@ee.columbia.edu>
 3 | All rights reserved.
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are met:
 7 |     * Redistributions of source code must retain the above copyright
 8 |       notice, this list of conditions and the following disclaimer.
 9 |     * Redistributions in binary form must reproduce the above copyright
10 |       notice, this list of conditions and the following disclaimer in the
11 |       documentation and/or other materials provided with the distribution.
12 |     * Neither the name of the copyright holder nor the names of its contributors
13 |       may be used to endorse or promote products derived from this software
14 |       without specific prior written permission.
15 | 
16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
20 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
23 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/ERBFilterBank.m:
--------------------------------------------------------------------------------
 1 | function output = ERBFilterBank(x, fcoefs)
 2 | % function output = ERBFilterBank(x, fcoefs)
 3 | % Process an input waveform with a gammatone filter bank. This function 
 4 | % takes a single sound vector, and returns an array of filter outputs, one 
 5 | % channel per row.
 6 | %
 7 | % The fcoefs parameter, which completely specifies the Gammatone filterbank,
 8 | % should be designed with the MakeERBFilters function.  If it is omitted,
 9 | % the filter coefficients are computed for you assuming a 22050Hz sampling
10 | % rate and 64 filters regularly spaced on an ERB scale from fs/2 down to 100Hz.
11 | %
12 | 
13 | % Malcolm Slaney @ Interval, June 11, 1998.
14 | % (c) 1998 Interval Research Corporation  
15 | % Thanks to Alain de Cheveigne' for his suggestions and improvements.
16 | 
17 | if nargin < 1
18 | 	error('Syntax: output_array = ERBFilterBank(input_vector[, fcoefs]);');
19 | end
20 | 
21 | if nargin < 2
22 | 	fcoefs = MakeERBFilters(22050,64,100);
23 | end
24 | 
25 | if size(fcoefs,2) ~= 10
26 | 	error('fcoefs parameter passed to ERBFilterBank is the wrong size.');
27 | end
28 | 
29 | if size(x,2) < size(x,1)
30 | 	x = x';
31 | end
32 | 
33 | A0  = fcoefs(:,1);
34 | A11 = fcoefs(:,2);
35 | A12 = fcoefs(:,3);
36 | A13 = fcoefs(:,4);
37 | A14 = fcoefs(:,5);
38 | A2  = fcoefs(:,6);
39 | B0  = fcoefs(:,7);
40 | B1  = fcoefs(:,8);
41 | B2  = fcoefs(:,9);
42 | gain= fcoefs(:,10);	
43 | 
44 | output = zeros(size(gain,1), length(x));
45 | for chan = 1: size(gain,1)
46 | 	y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ...
47 | 		   A2(chan)/gain(chan)], ...
48 | 				[B0(chan) B1(chan) B2(chan)], x);
49 | 	y2=filter([A0(chan) A12(chan) A2(chan)], ...
50 | 				[B0(chan) B1(chan) B2(chan)], y1);
51 | 	y3=filter([A0(chan) A13(chan) A2(chan)], ...
52 | 				[B0(chan) B1(chan) B2(chan)], y2);
53 | 	y4=filter([A0(chan) A14(chan) A2(chan)], ...
54 | 				[B0(chan) B1(chan) B2(chan)], y3);
55 | 	output(chan, :) = y4;
56 | end
57 | 
58 | if 0
59 | 	semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(output))));
60 | end
61 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/ERBSpace.m:
--------------------------------------------------------------------------------
 1 | function cfArray = ERBSpace(lowFreq, highFreq, N)
 2 | % function cfArray = ERBSpace(lowFreq, highFreq, N)
 3 | % This function computes an array of N frequencies uniformly spaced between
 4 | % highFreq and lowFreq on an ERB scale.  N is set to 100 if not specified.
 5 | %
 6 | % See also linspace, logspace, MakeERBCoeffs, MakeERBFilters.
 7 | %
 8 | % For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
 9 | % "Suggested formulae for calculating auditory-filter bandwidths and
10 | % excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
11 | 
12 | if nargin < 1
13 | 	lowFreq = 100;
14 | end
15 | 
16 | if nargin < 2
17 | 	highFreq = 44100/4;
18 | end
19 | 
20 | if nargin < 3
21 | 	N = 100;
22 | end
23 | 
24 | % Change the following three parameters if you wish to use a different
25 | % ERB scale.  Must change in MakeERBCoeffs too.
26 | EarQ = 9.26449;				%  Glasberg and Moore Parameters
27 | minBW = 24.7;
28 | order = 1;
29 | 
30 | % All of the followFreqing expressions are derived in Apple TR #35, "An
31 | % Efficient Implementation of the Patterson-Holdsworth Cochlear
32 | % Filter Bank."  See pages 33-34.
33 | cfArray = -(EarQ*minBW) + exp((1:N)'*(-log(highFreq + EarQ*minBW) + ...
34 | 		log(lowFreq + EarQ*minBW))/N) * (highFreq + EarQ*minBW);
35 | 
36 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/MakeERBFilters.m:
--------------------------------------------------------------------------------
  1 | function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq)
  2 | % function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq)
  3 | % This function computes the filter coefficients for a bank of 
  4 | % Gammatone filters.  These filters were defined by Patterson and 
  5 | % Holdworth for simulating the cochlea.  
  6 | % 
  7 | % The result is returned as an array of filter coefficients.  Each row 
  8 | % of the filter arrays contains the coefficients for four second order 
  9 | % filters.  The transfer function for these four filters share the same
 10 | % denominator (poles) but have different numerators (zeros).  All of these
 11 | % coefficients are assembled into one vector that the ERBFilterBank 
 12 | % can take apart to implement the filter.
 13 | %
 14 | % The filter bank contains "numChannels" channels that extend from
 15 | % half the sampling rate (fs) to "lowFreq".  Alternatively, if the numChannels
 16 | % input argument is a vector, then the values of this vector are taken to
 17 | % be the center frequency of each desired filter.  (The lowFreq argument is
 18 | % ignored in this case.)
 19 | 
 20 | % Note this implementation fixes a problem in the original code by
 21 | % computing four separate second order filters.  This avoids a big
 22 | % problem with round off errors in cases of very small cfs (100Hz) and
 23 | % large sample rates (44kHz).  The problem is caused by roundoff error
 24 | % when a number of poles are combined, all very close to the unit
 25 | % circle.  Small errors in the eigth order coefficient, are multiplied
 26 | % when the eigth root is taken to give the pole location.  These small
 27 | % errors lead to poles outside the unit circle and instability.  Thanks
 28 | % to Julius Smith for leading me to the proper explanation.
 29 | 
 30 | % Execute the following code to evaluate the frequency
 31 | % response of a 10 channel filterbank.
 32 | %	fcoefs = MakeERBFilters(16000,10,100);
 33 | %	y = ERBFilterBank([1 zeros(1,511)], fcoefs);
 34 | %	resp = 20*log10(abs(fft(y')));
 35 | %	freqScale = (0:511)/512*16000;
 36 | %	semilogx(freqScale(1:255),resp(1:255,:));
 37 | %	axis([100 16000 -60 0])
 38 | %	xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)');
 39 | 
 40 | % Rewritten by Malcolm Slaney@Interval.  June 11, 1998.
 41 | % (c) 1998 Interval Research Corporation  
 42 | 
 43 | T = 1/fs;
 44 | if length(numChannels) == 1
 45 | 	cf = ERBSpace(lowFreq, fs/2, numChannels);
 46 | else
 47 | 	cf = numChannels(1:end);
 48 | 	if size(cf,2) > size(cf,1)
 49 | 		cf = cf';
 50 | 	end
 51 | end
 52 | 
 53 | % Change the followFreqing three parameters if you wish to use a different
 54 | % ERB scale.  Must change in ERBSpace too.
 55 | EarQ = 9.26449;				%  Glasberg and Moore Parameters
 56 | minBW = 24.7;
 57 | order = 1;
 58 | 
 59 | ERB = ((cf/EarQ).^order + minBW^order).^(1/order);
 60 | B=1.019*2*pi*ERB;
 61 | 
 62 | A0 = T;
 63 | A2 = 0;
 64 | B0 = 1;
 65 | B1 = -2*cos(2*cf*pi*T)./exp(B*T);
 66 | B2 = exp(-2*B*T);
 67 | 
 68 | A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ...
 69 | 		exp(B*T))/2;
 70 | A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ...
 71 | 		exp(B*T))/2;
 72 | A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ...
 73 | 		exp(B*T))/2;
 74 | A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ...
 75 | 		exp(B*T))/2;
 76 | 
 77 | gain = abs((-2*exp(4*i*cf*pi*T)*T + ...
 78 |                  2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
 79 |                          (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ...
 80 |                           sin(2*cf*pi*T))) .* ...
 81 |            (-2*exp(4*i*cf*pi*T)*T + ...
 82 |              2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
 83 |               (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ...
 84 |                sin(2*cf*pi*T))).* ...
 85 |            (-2*exp(4*i*cf*pi*T)*T + ...
 86 |              2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
 87 |               (cos(2*cf*pi*T) - ...
 88 |                sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ...
 89 |            (-2*exp(4*i*cf*pi*T)*T + 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
 90 |            (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ...
 91 |           (-2 ./ exp(2*B*T) - 2*exp(4*i*cf*pi*T) +  ...
 92 |            2*(1 + exp(4*i*cf*pi*T))./exp(B*T)).^4);
 93 | 	
 94 | allfilts = ones(length(cf),1);
 95 | fcoefs = [A0*allfilts A11 A12 A13 A14 A2*allfilts B0*allfilts B1 B2 gain];
 96 | 
 97 | if (0)						% Test Code
 98 | 	A0  = fcoefs(:,1);
 99 | 	A11 = fcoefs(:,2);
100 | 	A12 = fcoefs(:,3);
101 | 	A13 = fcoefs(:,4);
102 | 	A14 = fcoefs(:,5);
103 | 	A2  = fcoefs(:,6);
104 | 	B0  = fcoefs(:,7);
105 | 	B1  = fcoefs(:,8);
106 | 	B2  = fcoefs(:,9);
107 | 	gain= fcoefs(:,10);	
108 | 	chan=1;
109 | 	x = [1 zeros(1, 511)];
110 | 	y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ...
111 | 		A2(chan)/gain(chan)],[B0(chan) B1(chan) B2(chan)], x);
112 | 	y2=filter([A0(chan) A12(chan) A2(chan)], ...
113 | 			[B0(chan) B1(chan) B2(chan)], y1);
114 | 	y3=filter([A0(chan) A13(chan) A2(chan)], ...
115 | 			[B0(chan) B1(chan) B2(chan)], y2);
116 | 	y4=filter([A0(chan) A14(chan) A2(chan)], ...
117 | 			[B0(chan) B1(chan) B2(chan)], y3);
118 | 	semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(y4))));
119 | end
120 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/README:
--------------------------------------------------------------------------------
1 | These files are the original auditory toolkit/gammatone filterbank code created
2 | by Malcolm Slaney and Dan Ellis, published at:
3 | 
4 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/
5 | https://engineering.purdue.edu/~malcolm/interval/1998-010/
6 | 
7 | Any non-code assets (ie. the sample WAV file and associated graphs) have been
8 | removed.
9 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/demo_gammatone.m:
--------------------------------------------------------------------------------
  1 | %% Gammatone-like spectrograms
  2 | % Gammatone filters are a popular linear approximation to the
  3 | % filtering performed by the ear.  This routine provides a simple
  4 | % wrapper for generating time-frequency surfaces based on a
  5 | % gammatone analysis, which can be used as a replacement for a
  6 | % conventional spectrogram.  It also provides a fast approximation
  7 | % to this surface based on weighting the output of a conventional
  8 | % FFT. 
  9 | 
 10 | %% Introduction
 11 | % It is very natural to visualize sound as a time-varying
 12 | % distribution of energy in frequency - not least because this is
 13 | % one way of describing the information our brains get from our
 14 | % ears via the auditory nerve.  The spectrogram is the traditional
 15 | % time-frequency visualization, but it actually has some important
 16 | % differences from how sound is analyzed by the ear, most
 17 | % significantly that the ear's frequency subbands get wider for
 18 | % higher frequencies, whereas the spectrogram has a constant
 19 | % bandwidth across all frequency channels.
 20 | % 
 21 | % There have been many signal-processing approximations proposed
 22 | % for the frequency analysis performed by the ear; one of the most
 23 | % popular is the Gammatone filterbank originally proposed by 
 24 | % Roy Patterson and colleagues in 1992.  Gammatone filters were 
 25 | % conceived as a simple fit to experimental observations of 
 26 | % the mammalian cochlea, and have a repeated pole structure leading
 27 | % to an impulse response that is the product of a Gamma envelope 
 28 | % g(t) = t^n e^{-t} and a sinusoid (tone).
 29 | %
 30 | % One reason for the popularity of this approach is the
 31 | % availability of an implementation by Malcolm Slaney, as 
 32 | % described in:
 33 | %
 34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2", 
 35 | % Technical Report #1998-010, Interval Research Corporation, 1998. 
 36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
 37 | %
 38 | % Malcolm's toolbox includes routines to design a Gammatone 
 39 | % filterbank and to process a signal by every filter in a bank, 
 40 | % but in order to convert this into a time-frequency visualization 
 41 | % it is necessary to sum up the energy within regular time bins.
 42 | % While this is not complicated, the function here provides a 
 43 | % convenient wrapper to achieve this final step, for applications 
 44 | % that are content to work with time-frequency magnitude
 45 | % distributions instead of going down to the waveform levels.  In
 46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters 
 47 | % and ERBFilterBank routines.
 48 | %
 49 | % This is, however, quite a computationally expensive approach, so
 50 | % we also provide an alternative algorithm that gives very similar
 51 | % results.  In this mode, the Gammatone-based spectrogram is
 52 | % constructed by first calculating a conventional, fixed-bandwidth
 53 | % spectrogram, then combining the fine frequency resolution of the
 54 | % FFT-based spectra into the coarser, smoother Gammatone responses
 55 | % via a weighting function.  This calculates the time-frequency
 56 | % distribution some 30-40x faster than the full approach.
 57 | 
 58 | %% Routines
 59 | % The code consists of a main routine, <gammatonegram.m gammatonegram>, 
 60 | % which takes a waveform and other parameters and returns a
 61 | % spectrogram-like time-frequency matrix, and a helper function 
 62 | % <fft2gammatonemx.m fft2gammatonemx>, which constructs the
 63 | % weighting matrix to convert FFT output spectra into gammatone
 64 | % approximations. 
 65 | 
 66 | %% Example usage
 67 | % First, we calculate a Gammatone-based spectrogram-like image of 
 68 | % a speech waveform using the fast approximation.  Then we do the 
 69 | % same thing using the full filtering approach, for comparison.
 70 | 
 71 | % Load a waveform, calculate its gammatone spectrogram, then display:
 72 | [d,sr] = wavread('sa2.wav');
 73 | tic; [D,F] = gammatonegram(d,sr); toc
 74 | %Elapsed time is 0.140742 seconds.
 75 | subplot(211)
 76 | imagesc(20*log10(D)); axis xy
 77 | caxis([-90 -30])
 78 | colorbar
 79 | % F returns the center frequencies of each band;
 80 | % display whichever elements were shown by the autoscaling
 81 | set(gca,'YTickLabel',round(F(get(gca,'YTick'))));
 82 | ylabel('freq / Hz');
 83 | xlabel('time / 10 ms steps');
 84 | title('Gammatonegram - fast method')
 85 | 
 86 | % Now repeat with flag to use actual subband filters.
 87 | % Since it's the last argument, we have to include all the other
 88 | % arguments.  These are the default values for: summation window 
 89 | % (0.025 sec), hop between successive windows (0.010 sec), 
 90 | % number of gammatone channels (64), lowest frequency (50 Hz), 
 91 | % and highest frequency (sr/2).  The last argument as zero 
 92 | % means not to use the FFT approach.
 93 | tic; [D2,F2] = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc
 94 | %Elapsed time is 3.165083 seconds.
 95 | subplot(212)
 96 | imagesc(20*log10(D2)); axis xy
 97 | caxis([-90 -30])
 98 | colorbar
 99 | set(gca,'YTickLabel',round(F(get(gca,'YTick'))));
100 | ylabel('freq / Hz');
101 | xlabel('time / 10 ms steps');
102 | title('Gammatonegram - accurate method')
103 | % Actual gammatone filters appear somewhat narrower.  The fast 
104 | % version assumes coherence of addition of amplitude from 
105 | % different channels, whereas the actual subband energies will
106 | % depend on how the energy in different frequencies combines.
107 | % Also notice the visible time smearing in the low frequency 
108 | % channels that does not occur in the fast version.
109 | 
110 | %% Validation
111 | % We can check the frequency responses of the filterbank 
112 | % simulated with the fast method against the actual filters 
113 | % from Malcolm's toolbox.  They match very closely, but of 
114 | % course this still doesn't mean the two approaches will give 
115 | % identical results - because the fast method ignores the phase 
116 | % of each frequency channel when summing up.
117 | 
118 | % Check the frequency responses to see that they match:
119 | % Put an impulse through the Slaney ERB filters, then take the 
120 | % frequency response of each impulse response.
121 | fcfs = flipud(MakeERBFilters(16000,64,50));
122 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs);
123 | H = zeros(64,512);
124 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end
125 | % The weighting matrix for the FFT is the frequency response 
126 | % of each output filter
127 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512);
128 | % Plot every 5th channel from both.  Offset by 3 dB just so we can
129 | % see both
130 | fs = [0:511]/512*8000;
131 | figure
132 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r')
133 | axis([0 8000 -150 0])
134 | grid
135 | % Line up pretty well, apart from wiggles below -100 dB
136 | % (from truncating the impulse response at 1000 samples?)
137 | 
138 | %% Download
139 | % You can download all the code and data for these examples here:
140 | % <gammatonegram.tgz gammatonegram.tgz>.
141 | 
142 | %% Referencing
143 | % If you use this work in a publication, I would be grateful 
144 | % if you referenced this page as follows:
145 | %
146 | % D. P. W. Ellis (2009).  "Gammatone-like spectrograms", web resource.
147 | % http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/
148 | 
149 | %% Acknowledgment
150 | % This project was supported in part by the NSF under 
151 | % grant IIS-0535168. Any opinions, findings and conclusions 
152 | % or recommendations expressed in this material are those of the 
153 | % authors and do not necessarily reflect the views of the Sponsors.
154 | 
155 | % Last updated: $Date: 2009/07/07 14:14:11 $
156 | % Dan Ellis <dpwe@ee.columbia.edu>
157 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/fft2gammatonemx.m:
--------------------------------------------------------------------------------
  1 | function [wts,gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen)
  2 | % wts = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen)
  3 | %      Generate a matrix of weights to combine FFT bins into
  4 | %      Gammatone bins.  nfft defines the source FFT size at
  5 | %      sampling rate sr.  Optional nfilts specifies the number of
  6 | %      output bands required (default 64), and width is the
  7 | %      constant width of each band in Bark (default 1).
  8 | %      minfreq, maxfreq specify range covered in Hz (100, sr/2).
  9 | %      While wts has nfft columns, the second half are all zero. 
 10 | %      Hence, aud spectrum is
 11 | %      fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft));
 12 | %      maxlen truncates the rows to this many bins
 13 | %
 14 | % 2004-09-05  Dan Ellis dpwe@ee.columbia.edu  based on rastamat/audspec.m
 15 | % Last updated: $Date: 2009/02/22 02:29:25 $
 16 | 
 17 | if nargin < 2;    sr = 16000; end
 18 | if nargin < 3;    nfilts = 64; end
 19 | if nargin < 4;    width = 1.0; end
 20 | if nargin < 5;    minfreq = 100; end
 21 | if nargin < 6;    maxfreq = sr/2; end
 22 | if nargin < 7;    maxlen = nfft; end
 23 | 
 24 | wts = zeros(nfilts, nfft);
 25 | 
 26 | % after Slaney's MakeERBFilters
 27 | EarQ = 9.26449;
 28 | minBW = 24.7;
 29 | order = 1;
 30 | 
 31 | cfreqs = -(EarQ*minBW) + exp((1:nfilts)'*(-log(maxfreq + EarQ*minBW) + ...
 32 |                 log(minfreq + EarQ*minBW))/nfilts) * (maxfreq + EarQ*minBW);
 33 | cfreqs = flipud(cfreqs);
 34 | 
 35 | GTord = 4;
 36 | 
 37 | ucirc = exp(j*2*pi*[0:(nfft/2)]/nfft);
 38 | 
 39 | justpoles = 0;
 40 | 
 41 | for i = 1:nfilts
 42 |   cf = cfreqs(i);
 43 |   ERB = width*((cf/EarQ).^order + minBW^order).^(1/order);
 44 |   B = 1.019*2*pi*ERB;
 45 |   r = exp(-B/sr);
 46 |   theta = 2*pi*cf/sr;
 47 |   pole = r*exp(j*theta);
 48 | 
 49 |   if justpoles == 1
 50 |     % point on unit circle of maximum gain, from differentiating magnitude
 51 |     cosomegamax = (1+r*r)/(2*r)*cos(theta);
 52 |     if abs(cosomegamax) > 1
 53 |       if theta < pi/2;  omegamax = 0; 
 54 |       else              omegamax = pi;   end
 55 |     else
 56 |       omegamax = acos(cosomegamax);
 57 |     end
 58 |     center = exp(j*omegamax);
 59 |     gain = abs((pole-center).*(pole'-center)).^GTord;
 60 |     wts(i,1:(nfft/2+1)) = gain * (abs((pole-ucirc).*(pole'- ...
 61 |                                                      ucirc)).^-GTord);
 62 |   else
 63 |     % poles and zeros, following Malcolm's MakeERBFilter
 64 |     T = 1/sr;
 65 |     A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2* ...
 66 |                                                       cf*pi*T)./exp(B*T))/2; 
 67 |     A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2* ...
 68 |                                                       cf*pi*T)./exp(B*T))/2;
 69 |     A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2* ...
 70 |                                                       cf*pi*T)./exp(B*T))/2; 
 71 |     A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2* ...
 72 |                                                       cf*pi*T)./exp(B*T))/2; 
 73 |     zros = -[A11 A12 A13 A14]/T;
 74 |     
 75 |     gain(i) =  abs((-2*exp(4*j*cf*pi*T)*T + ...
 76 |                 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
 77 |                 (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ...
 78 |                  sin(2*cf*pi*T))) .* ...
 79 |                (-2*exp(4*j*cf*pi*T)*T + ...
 80 |                 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
 81 |                 (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ...
 82 |                  sin(2*cf*pi*T))).* ...
 83 |                (-2*exp(4*j*cf*pi*T)*T + ...
 84 |                 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
 85 |                 (cos(2*cf*pi*T) - ...
 86 |                  sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ...
 87 |                (-2*exp(4*j*cf*pi*T)*T + 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
 88 |                 (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ...
 89 |                (-2 ./ exp(2*B*T) - 2*exp(4*j*cf*pi*T) +  ...
 90 |                 2*(1 + exp(4*j*cf*pi*T))./exp(B*T)).^4);
 91 |     wts(i,1:(nfft/2+1)) = ((T^4)/gain(i)) ...
 92 |         * abs(ucirc-zros(1)).*abs(ucirc-zros(2))...
 93 |         .*abs(ucirc-zros(3)).*abs(ucirc-zros(4))...
 94 |         .*(abs((pole-ucirc).*(pole'-ucirc)).^-GTord);
 95 |   end
 96 | end
 97 | 
 98 | wts = wts(:,1:maxlen);
 99 | 
100 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/gammatone_demo.m:
--------------------------------------------------------------------------------
  1 | %% Gammatone-like spectrograms
  2 | % Gammatone filters are a popular linear approximation to the
  3 | % filtering performed by the ear.  This routine provides a simple
  4 | % wrapper for generating time-frequency surfaces based on a
  5 | % gammatone analysis, which can be used as a replacement for a
  6 | % conventional spectrogram.  It also provides a fast approximation
  7 | % to this surface based on weighting the output of a conventional
  8 | % FFT. 
  9 | 
 10 | %% Introduction
 11 | % It is very natural to visualize sound as a time-varying
 12 | % distribution of energy in frequency - not least because this is
 13 | % one way of describing the information our brains get from our
 14 | % ears via the auditory nerve.  The spectrogram is the traditional
 15 | % time-frequency visualization, but it actually has some important
 16 | % differences from how sound is analyzed by the ear, most
 17 | % significantly that the ear's frequency subbands get wider for
 18 | % higher frequencies, whereas the spectrogram has a constant
 19 | % bandwidth across all frequency channels.
 20 | % 
 21 | % There have been many signal-processing approximations proposed
 22 | % for the frequency analysis performed by the ear; one of the most
 23 | % popular is the Gammatone filterbank originally proposed by 
 24 | % Roy Patterson and colleagues in 1992.  Gammatone filters were 
 25 | % conceived as a simple fit to experimental observations of 
 26 | % the mammalian cochlea, and have a repeated pole structure leading
 27 | % to an impulse response that is the product of a Gamma envelope 
 28 | % g(t) = t^n e^{-t} and a sinusoid (tone).
 29 | %
 30 | % One reason for the popularity of this approach is the
 31 | % availability of an implementation by Malcolm Slaney, as 
 32 | % described in:
 33 | %
 34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2", 
 35 | % Technical Report #1998-010, Interval Research Corporation, 1998. 
 36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
 37 | %
 38 | % Malcolm's toolbox includes routines to design a Gammatone 
 39 | % filterbank and to process a signal by every filter in a bank, 
 40 | % but in order to convert this into a time-frequency visualization 
 41 | % it is necessary to sum up the energy within regular time bins.
 42 | % While this is not complicated, the function here provides a 
 43 | % convenient wrapper to achieve this final step, for applications 
 44 | % that are content to work with time-frequency magnitude
 45 | % distributions instead of going down to the waveform levels.  In
 46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters 
 47 | % and ERBFilterBank routines.
 48 | %
 49 | % This is, however, quite a computationally expensive approach, so
 50 | % we also provide an alternative algorithm that gives very similar
 51 | % results.  In this mode, the Gammatone-based spectrogram is
 52 | % constructed by first calculating a conventional, fixed-bandwidth
 53 | % spectrogram, then combining the fine frequency resolution of the
 54 | % FFT-based spectra into the coarser, smoother Gammatone responses
 55 | % via a weighting function.  This calculates the time-frequency
 56 | % distribution some 30-40x faster than the full approach.
 57 | 
 58 | %% Routines
 59 | % The code consists of a main routine, <gammatonegram.m gammatonegram>, 
 60 | % which takes a waveform and other parameters and returns a
 61 | % spectrogram-like time-frequency matrix, and a helper function 
 62 | % <fft2gammatonemx.m fft2gammatonemx>, which constructs the
 63 | % weighting matrix to convert FFT output spectra into gammatone
 64 | % approximations. 
 65 | 
 66 | %% Example usage
 67 | % First, we calculate a Gammatone-based spectrogram-like image of 
 68 | % a speech waveform using the fast approximation.  Then we do the 
 69 | % same thing using the full filtering approach, for comparison.
 70 | 
 71 | % Load a waveform, calculate its gammatone spectrogram, then display:
 72 | [d,sr] = wavread('sa2.wav');
 73 | tic; D = gammatonegram(d,sr); toc
 74 | %Elapsed time is 0.140742 seconds.
 75 | subplot(211)
 76 | imagesc(20*log10(D)); axis xy
 77 | caxis([-90 -30])
 78 | colorbar
 79 | title('Gammatonegram - fast method')
 80 | 
 81 | % Now repeat with flag to use actual subband filters.
 82 | % Since it's the last argument, we have to include all the other
 83 | % arguments.  These are the default values for: summation window 
 84 | % (0.025 sec), hop between successive windows (0.010 sec), 
 85 | % number of gammatone channels (64), lowest frequency (50 Hz), 
 86 | % and highest frequency (sr/2).  The last argument as zero 
 87 | % means not to use the FFT approach.
 88 | tic; D2 = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc
 89 | %Elapsed time is 3.165083 seconds.
 90 | subplot(212)
 91 | imagesc(20*log10(D2)); axis xy
 92 | caxis([-90 -30])
 93 | colorbar
 94 | title('Gammatonegram - accurate method')
 95 | % Actual gammatone filters appear somewhat narrower.  The fast 
 96 | % version assumes coherence of addition of amplitude from 
 97 | % different channels, whereas the actual subband energies will
 98 | % depend on how the energy in different frequencies combines.
 99 | % Also notice the visible time smearing in the low frequency 
100 | % channels that does not occur in the fast version.
101 | 
102 | %% Validation
103 | % We can check the frequency responses of the filterbank 
104 | % simulated with the fast method against the actual filters 
105 | % from Malcolm's toolbox.  They match very closely, but of 
106 | % course this still doesn't mean the two approaches will give 
107 | % identical results - because the fast method ignores the phase 
108 | % of each frequency channel when summing up.
109 | 
110 | % Check the frequency responses to see that they match:
111 | % Put an impulse through the Slaney ERB filters, then take the 
112 | % frequency response of each impulse response.
113 | fcfs = flipud(MakeERBFilters(16000,64,50));
114 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs);
115 | H = zeros(64,512);
116 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end
117 | % The weighting matrix for the FFT is the frequency response 
118 | % of each output filter
119 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512);
120 | % Plot every 5th channel from both.  Offset by 3 dB just so we can
121 | % see both
122 | fs = [0:511]/512*8000;
123 | figure
124 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r')
125 | axis([0 8000 -150 0])
126 | grid
127 | % Line up pretty well, apart from wiggles below -100 dB
128 | % (from truncating the impulse response at 1000 samples?)
129 | 
130 | %% Download
131 | % You can download all the code and data for these examples here:
132 | % <gammatone.tgz gammatone.tgz>.
133 | 
134 | %% Referencing
135 | % If you use this work in a publication, I would be grateful 
136 | % if you referenced this page as follows:
137 | %
138 | %  D. P. W. Ellis (2009).  "Gammatone-like spectrograms", web resource, http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/ .
139 | 
140 | %% Acknowledgment
141 | % This project was supported in part by the NSF under 
142 | % grant IIS-0535168. Any opinions, findings and conclusions 
143 | % or recommendations expressed in this material are those of the 
144 | % authors and do not necessarily reflect the views of the Sponsors.
145 | 
146 | % Last updated: $Date: 2009/02/22 01:46:42 $
147 | % Dan Ellis <dpwe@ee.columbia.edu>
148 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/gammatonegram.m:
--------------------------------------------------------------------------------
 1 | function [Y,F] = gammatonegram(X,SR,TWIN,THOP,N,FMIN,FMAX,USEFFT,WIDTH)
 2 | % [Y,F] = gammatonegram(X,SR,N,TWIN,THOP,FMIN,FMAX,USEFFT,WIDTH)
 3 | %    Calculate a spectrogram-like time frequency magnitude array
 4 | %    based on Gammatone subband filters.  Waveform X (at sample
 5 | %    rate SR) is passed through an N (default 64) channel gammatone 
 6 | %    auditory model filterbank, with lowest frequency FMIN (50) 
 7 | %    and highest frequency FMAX (SR/2).  The outputs of each band 
 8 | %    then have their energy integrated over windows of TWIN secs 
 9 | %    (0.025), advancing by THOP secs (0.010) for successive
10 | %    columns.  These magnitudes are returned as an N-row
11 | %    nonnegative real matrix, Y.
12 | %    If USEFFT is present and zero, revert to actual filtering and
13 | %    summing energy within windows.
14 | %    WIDTH (default 1.0) is how to scale bandwidth of filters 
15 | %    relative to ERB default (for fast method only).
16 | %    F returns the center frequencies in Hz of each row of Y
17 | %    (uniformly spaced on a Bark scale).
18 | %
19 | % 2009-02-18 DAn Ellis dpwe@ee.columbia.edu
20 | % Last updated: $Date: 2009/02/23 21:07:09 $
21 | 
22 | if nargin < 2;  SR = 16000; end
23 | if nargin < 3;  TWIN = 0.025; end
24 | if nargin < 4;  THOP = 0.010; end
25 | if nargin < 5;  N = 64; end
26 | if nargin < 6;  FMIN = 50; end
27 | if nargin < 7;  FMAX = SR/2; end
28 | if nargin < 8;  USEFFT = 1; end
29 | if nargin < 9;  WIDTH = 1.0; end
30 | 
31 | 
32 | if USEFFT == 0 
33 | 
34 |   % Use malcolm's function to filter into subbands
35 |   %%%% IGNORES FMAX! *****
36 |   [fcoefs,F] = MakeERBFilters(SR, N, FMIN);
37 |   fcoefs = flipud(fcoefs);
38 | 
39 |   XF = ERBFilterBank(X,fcoefs);
40 | 
41 |   nwin = round(TWIN*SR);
42 | % Always use rectangular window for now
43 | %  if USEHANN == 1
44 |     window = hann(nwin)';
45 | %  else
46 | %    window = ones(1,nwin);
47 | %  end
48 | %  window = window/sum(window);
49 | %  XE = [zeros(N,round(nwin/2)),XF.^2,zeros(N,round(nwin/2))];
50 |   XE = [XF.^2];
51 | 
52 |   hopsamps = round(THOP*SR);
53 | 
54 |   ncols = 1 + floor((size(XE,2)-nwin)/hopsamps);
55 | 
56 |   Y = zeros(N,ncols);
57 | 
58 | %  winmx = repmat(window,N,1);
59 | 
60 |   for i = 1:ncols
61 | %    Y(:,i) = sqrt(sum(winmx.*XE(:,(i-1)*hopsamps + [1:nwin]),2));
62 |     Y(:,i) = sqrt(mean(XE(:,(i-1)*hopsamps + [1:nwin]),2));
63 |   end
64 | 
65 | else 
66 |   % USEFFT version
67 |   % How long a window to use relative to the integration window requested
68 |   winext = 1;
69 |   twinmod = winext * TWIN;
70 |   % first spectrogram
71 |   nfft = 2^(ceil(log(2*twinmod*SR)/log(2)));
72 |   nhop = round(THOP*SR);
73 |   nwin = round(twinmod*SR);
74 |   [gtm,F] = fft2gammatonemx(nfft, SR, N, WIDTH, FMIN, FMAX, nfft/2+1);
75 |   % perform FFT and weighting in amplitude domain
76 |   Y = 1/nfft*gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop));
77 |   % or the power domain?  doesn't match nearly as well
78 |   %Y = 1/nfft*sqrt(gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop).^2));
79 | end
80 | 
81 | 
82 | 
83 | 


--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/specgram.m:
--------------------------------------------------------------------------------
 1 | function y = specgram(x,n,sr,w,ov)
 2 | % Y = myspecgram(X,NFFT,SR,W,OV)
 3 | %      Substitute for Matlab's specgram, calculates & displays spectrogram
 4 | % $Header: /homes/dpwe/tmp/e6820/RCS/myspecgram.m,v 1.1 2002/08/04 19:20:27 dpwe Exp $
 5 | 
 6 | if (size(x,1) > size(x,2))
 7 |   x = x';
 8 | end
 9 | 
10 | s = length(x);
11 | 
12 | if nargin < 2
13 |   n = 256;
14 | end
15 | if nargin < 3
16 |   sr = 1;
17 | end
18 | if nargin < 4
19 |   w = n;
20 | end
21 | if nargin < 5
22 |   ov = w/2;
23 | end
24 | h = w - ov;
25 | 
26 | halflen = w/2;
27 | halff = n/2;   % midpoint of win
28 | acthalflen = min(halff, halflen);
29 | 
30 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
31 | win = zeros(1, n);
32 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
33 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
34 | 
35 | c = 1;
36 | 
37 | % pre-allocate output array
38 | ncols = 1+fix((s-n)/h);
39 | d = zeros((1+n/2), ncols);
40 | 
41 | for b = 0:h:(s-n)
42 |   u = win.*x((b+1):(b+n));
43 |   t = fft(u);
44 |   d(:,c) = t([1:(1+n/2)]');
45 |   c = c+1;
46 | end;
47 | 
48 | tt = [0:h:(s-n)]/sr;
49 | ff = [0:(n/2)]*sr/n;
50 | 
51 | if nargout < 1
52 |   imagesc(tt,ff,20*log10(abs(d)));
53 |   axis xy
54 |   xlabel('Time / s');
55 |   ylabel('Frequency / Hz');
56 | else
57 |   y = d;
58 | end
59 | 


--------------------------------------------------------------------------------
/gammatone/doc/FurElise.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/doc/FurElise.png


--------------------------------------------------------------------------------
/gammatone/doc/Makefile:
--------------------------------------------------------------------------------
  1 | # Makefile for Sphinx documentation
  2 | #
  3 | 
  4 | # You can set these variables from the command line.
  5 | SPHINXOPTS    =
  6 | SPHINXBUILD   = sphinx-build
  7 | PAPER         =
  8 | BUILDDIR      = _build
  9 | 
 10 | # Internal variables.
 11 | PAPEROPT_a4     = -D latex_paper_size=a4
 12 | PAPEROPT_letter = -D latex_paper_size=letter
 13 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 14 | # the i18n builder cannot share the environment and doctrees with the others
 15 | I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 16 | 
 17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
 18 | 
 19 | help:
 20 | 	@echo "Please use \`make <target>' where <target> is one of"
 21 | 	@echo "  html       to make standalone HTML files"
 22 | 	@echo "  dirhtml    to make HTML files named index.html in directories"
 23 | 	@echo "  singlehtml to make a single large HTML file"
 24 | 	@echo "  pickle     to make pickle files"
 25 | 	@echo "  json       to make JSON files"
 26 | 	@echo "  htmlhelp   to make HTML files and a HTML help project"
 27 | 	@echo "  qthelp     to make HTML files and a qthelp project"
 28 | 	@echo "  devhelp    to make HTML files and a Devhelp project"
 29 | 	@echo "  epub       to make an epub"
 30 | 	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
 31 | 	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
 32 | 	@echo "  text       to make text files"
 33 | 	@echo "  man        to make manual pages"
 34 | 	@echo "  texinfo    to make Texinfo files"
 35 | 	@echo "  info       to make Texinfo files and run them through makeinfo"
 36 | 	@echo "  gettext    to make PO message catalogs"
 37 | 	@echo "  changes    to make an overview of all changed/added/deprecated items"
 38 | 	@echo "  linkcheck  to check all external links for integrity"
 39 | 	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
 40 | 
 41 | clean:
 42 | 	-rm -rf $(BUILDDIR)/*
 43 | 
 44 | html:
 45 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
 46 | 	@echo
 47 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 48 | 
 49 | dirhtml:
 50 | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 51 | 	@echo
 52 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
 53 | 
 54 | singlehtml:
 55 | 	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
 56 | 	@echo
 57 | 	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
 58 | 
 59 | pickle:
 60 | 	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
 61 | 	@echo
 62 | 	@echo "Build finished; now you can process the pickle files."
 63 | 
 64 | json:
 65 | 	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
 66 | 	@echo
 67 | 	@echo "Build finished; now you can process the JSON files."
 68 | 
 69 | htmlhelp:
 70 | 	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
 71 | 	@echo
 72 | 	@echo "Build finished; now you can run HTML Help Workshop with the" \
 73 | 	      ".hhp project file in $(BUILDDIR)/htmlhelp."
 74 | 
 75 | qthelp:
 76 | 	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
 77 | 	@echo
 78 | 	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
 79 | 	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
 80 | 	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/gammatone.qhcp"
 81 | 	@echo "To view the help file:"
 82 | 	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/gammatone.qhc"
 83 | 
 84 | devhelp:
 85 | 	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
 86 | 	@echo
 87 | 	@echo "Build finished."
 88 | 	@echo "To view the help file:"
 89 | 	@echo "# mkdir -p $$HOME/.local/share/devhelp/gammatone"
 90 | 	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/gammatone"
 91 | 	@echo "# devhelp"
 92 | 
 93 | epub:
 94 | 	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
 95 | 	@echo
 96 | 	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
 97 | 
 98 | latex:
 99 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100 | 	@echo
101 | 	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102 | 	@echo "Run \`make' in that directory to run these through (pdf)latex" \
103 | 	      "(use \`make latexpdf' here to do that automatically)."
104 | 
105 | latexpdf:
106 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 | 	@echo "Running LaTeX files through pdflatex..."
108 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf
109 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110 | 
111 | text:
112 | 	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113 | 	@echo
114 | 	@echo "Build finished. The text files are in $(BUILDDIR)/text."
115 | 
116 | man:
117 | 	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118 | 	@echo
119 | 	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120 | 
121 | texinfo:
122 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123 | 	@echo
124 | 	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125 | 	@echo "Run \`make' in that directory to run these through makeinfo" \
126 | 	      "(use \`make info' here to do that automatically)."
127 | 
128 | info:
129 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130 | 	@echo "Running Texinfo files through makeinfo..."
131 | 	make -C $(BUILDDIR)/texinfo info
132 | 	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133 | 
134 | gettext:
135 | 	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136 | 	@echo
137 | 	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138 | 
139 | changes:
140 | 	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141 | 	@echo
142 | 	@echo "The overview file is in $(BUILDDIR)/changes."
143 | 
144 | linkcheck:
145 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146 | 	@echo
147 | 	@echo "Link check complete; look for any errors in the above output " \
148 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
149 | 
150 | doctest:
151 | 	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152 | 	@echo "Testing of doctests in the sources finished, look at the " \
153 | 	      "results in $(BUILDDIR)/doctest/output.txt."
154 | 


--------------------------------------------------------------------------------
/gammatone/doc/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # gammatone documentation build configuration file, created by
  4 | # sphinx-quickstart on Sat Dec  8 23:21:49 2012.
  5 | #
  6 | # This file is execfile()d with the current directory set to its containing dir.
  7 | #
  8 | # Note that not all possible configuration values are present in this
  9 | # autogenerated file.
 10 | #
 11 | # All configuration values have a default; values that are commented out
 12 | # serve to show the default.
 13 | 
 14 | import sys, os
 15 | 
 16 | # If extensions (or modules to document with autodoc) are in another directory,
 17 | # add these directories to sys.path here. If the directory is relative to the
 18 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 19 | #sys.path.insert(0, os.path.abspath('.'))
 20 | 
 21 | # -- General configuration -----------------------------------------------------
 22 | 
 23 | # If your documentation needs a minimal Sphinx version, state it here.
 24 | #needs_sphinx = '1.0'
 25 | 
 26 | # Add any Sphinx extension module names here, as strings. They can be extensions
 27 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
 28 | extensions = ['sphinx.ext.autodoc']
 29 | 
 30 | # Add any paths that contain templates here, relative to this directory.
 31 | templates_path = ['_templates']
 32 | 
 33 | # The suffix of source filenames.
 34 | source_suffix = '.rst'
 35 | 
 36 | # The encoding of source files.
 37 | #source_encoding = 'utf-8-sig'
 38 | 
 39 | # The master toctree document.
 40 | master_doc = 'index'
 41 | 
 42 | # General information about the project.
 43 | project = u'Gammatone Filterbank Toolkit'
 44 | copyright = u'2014, Jason Heeris'
 45 | 
 46 | # The version info for the project you're documenting, acts as replacement for
 47 | # |version| and |release|, also used in various other places throughout the
 48 | # built documents.
 49 | #
 50 | # The short X.Y version.
 51 | version = '1.0'
 52 | # The full version, including alpha/beta/rc tags.
 53 | release = '1.0'
 54 | 
 55 | # The language for content autogenerated by Sphinx. Refer to documentation
 56 | # for a list of supported languages.
 57 | #language = None
 58 | 
 59 | # There are two options for replacing |today|: either, you set today to some
 60 | # non-false value, then it is used:
 61 | #today = ''
 62 | # Else, today_fmt is used as the format for a strftime call.
 63 | #today_fmt = '%B %d, %Y'
 64 | 
 65 | # List of patterns, relative to source directory, that match files and
 66 | # directories to ignore when looking for source files.
 67 | exclude_patterns = ['_build']
 68 | 
 69 | # The reST default role (used for this markup: `text`) to use for all documents.
 70 | #default_role = None
 71 | 
 72 | # If true, '()' will be appended to :func: etc. cross-reference text.
 73 | #add_function_parentheses = True
 74 | 
 75 | # If true, the current module name will be prepended to all description
 76 | # unit titles (such as .. function::).
 77 | #add_module_names = True
 78 | 
 79 | # If true, sectionauthor and moduleauthor directives will be shown in the
 80 | # output. They are ignored by default.
 81 | #show_authors = False
 82 | 
 83 | # The name of the Pygments (syntax highlighting) style to use.
 84 | pygments_style = 'sphinx'
 85 | 
 86 | # A list of ignored prefixes for module index sorting.
 87 | #modindex_common_prefix = []
 88 | 
 89 | 
 90 | # -- Options for HTML output ---------------------------------------------------
 91 | 
 92 | # The theme to use for HTML and HTML Help pages.  See the documentation for
 93 | # a list of builtin themes.
 94 | html_theme = 'haiku'
 95 | 
 96 | # Theme options are theme-specific and customize the look and feel of a theme
 97 | # further.  For a list of options available for each theme, see the
 98 | # documentation.
 99 | #html_theme_options = {}
100 | 
101 | # Add any paths that contain custom themes here, relative to this directory.
102 | #html_theme_path = []
103 | 
104 | # The name for this set of Sphinx documents.  If None, it defaults to
105 | # "<project> v<release> documentation".
106 | html_title = u"%s %s" % (project, release)
107 | 
108 | # A shorter title for the navigation bar.  Default is the same as html_title.
109 | #html_short_title = None
110 | 
111 | # The name of an image file (relative to this directory) to place at the top
112 | # of the sidebar.
113 | #html_logo = None
114 | 
115 | # The name of an image file (within the static path) to use as favicon of the
116 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
117 | # pixels large.
118 | #html_favicon = None
119 | 
120 | # Add any paths that contain custom static files (such as style sheets) here,
121 | # relative to this directory. They are copied after the builtin static files,
122 | # so a file named "default.css" will overwrite the builtin "default.css".
123 | html_static_path = ['_static']
124 | 
125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
126 | # using the given strftime format.
127 | #html_last_updated_fmt = '%b %d, %Y'
128 | 
129 | # If true, SmartyPants will be used to convert quotes and dashes to
130 | # typographically correct entities.
131 | html_use_smartypants = True
132 | 
133 | # Custom sidebar templates, maps document names to template names.
134 | html_sidebars = {
135 | 	'**' : [
136 | 		'localtoc.html',
137 | 		'globaltoc.html',
138 | 		'relations.html',
139 | 		'searchbox.html'
140 | 		],
141 | 	}
142 | 
143 | # Additional templates that should be rendered to pages, maps page names to
144 | # template names.
145 | #html_additional_pages = {}
146 | 
147 | # If false, no module index is generated.
148 | #html_domain_indices = True
149 | 
150 | # If false, no index is generated.
151 | #html_use_index = True
152 | 
153 | # If true, the index is split into individual pages for each letter.
154 | #html_split_index = False
155 | 
156 | # If true, links to the reST sources are added to the pages.
157 | html_show_sourcelink = False
158 | 
159 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
160 | #html_show_sphinx = True
161 | 
162 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
163 | #html_show_copyright = True
164 | 
165 | # If true, an OpenSearch description file will be output, and all pages will
166 | # contain a <link> tag referring to it.  The value of this option must be the
167 | # base URL from which the finished HTML is served.
168 | #html_use_opensearch = ''
169 | 
170 | # This is the file name suffix for HTML files (e.g. ".xhtml").
171 | #html_file_suffix = None
172 | 
173 | # Output file base name for HTML help builder.
174 | htmlhelp_basename = 'gammatonedoc'
175 | 
176 | 
177 | # -- Options for LaTeX output --------------------------------------------------
178 | 
179 | latex_elements = {
180 | # The paper size ('letterpaper' or 'a4paper').
181 | #'papersize': 'letterpaper',
182 | 
183 | # The font size ('10pt', '11pt' or '12pt').
184 | #'pointsize': '10pt',
185 | 
186 | # Additional stuff for the LaTeX preamble.
187 | #'preamble': '',
188 | }
189 | 
190 | # Grouping the document tree into LaTeX files. List of tuples
191 | # (source start file, target name, title, author, documentclass [howto/manual]).
192 | latex_documents = [
193 |   ('index', 'gammatone.tex', u'Gammatone Documentation',
194 |    u'Jason Heeris', 'manual'),
195 | ]
196 | 
197 | # The name of an image file (relative to this directory) to place at the top of
198 | # the title page.
199 | #latex_logo = None
200 | 
201 | # For "manual" documents, if this is true, then toplevel headings are parts,
202 | # not chapters.
203 | #latex_use_parts = False
204 | 
205 | # If true, show page references after internal links.
206 | #latex_show_pagerefs = False
207 | 
208 | # If true, show URL addresses after external links.
209 | #latex_show_urls = False
210 | 
211 | # Documents to append as an appendix to all manuals.
212 | #latex_appendices = []
213 | 
214 | # If false, no module index is generated.
215 | #latex_domain_indices = True
216 | 
217 | 
218 | # -- Options for manual page output --------------------------------------------
219 | 
220 | # One entry per manual page. List of tuples
221 | # (source start file, name, description, authors, manual section).
222 | man_pages = [
223 |     ('index', 'gammatone', u'Gammatone Documentation',
224 |      [u'Jason Heeris'], 1)
225 | ]
226 | 
227 | # If true, show URL addresses after external links.
228 | #man_show_urls = False
229 | 
230 | 
231 | # -- Options for Texinfo output ------------------------------------------------
232 | 
233 | # Grouping the document tree into Texinfo files. List of tuples
234 | # (source start file, target name, title, author,
235 | #  dir menu entry, description, category)
236 | texinfo_documents = [
237 |   ('index', 'gammatone', u'Gammatone Documentation',
238 |    u'Jason Heeris', 'gammatone', 'Gammatone filterbank construction tools.',
239 |    'Miscellaneous'),
240 | ]
241 | 
242 | # Documents to append as an appendix to all manuals.
243 | #texinfo_appendices = []
244 | 
245 | # If false, no module index is generated.
246 | #texinfo_domain_indices = True
247 | 
248 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
249 | #texinfo_show_urls = 'footnote'
250 | 
251 | # -- Autodoc configuration -----------------------------------------------------
252 | 
253 | # autodoc_default_flags = ['members']
254 | 


--------------------------------------------------------------------------------
/gammatone/doc/details.rst:
--------------------------------------------------------------------------------
  1 | About the Gammatone Filterbank Toolkit
  2 | --------------------------------------
  3 | 
  4 | Summary
  5 | ~~~~~~~
  6 | 
  7 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank
  8 | MATLAB code, detailed below, to Python 2 and 3 using Numpy and Scipy. It
  9 | analyses signals by running them through banks of gammatone filters,
 10 | similar to Fourier-based spectrogram analysis.
 11 | 
 12 | .. figure:: FurElise.png
 13 |    :align: center
 14 |    :alt: Gammatone-based spectrogram of Für Elise
 15 | 
 16 |    Gammatone-based spectrogram of Für Elise
 17 | 
 18 | Dependencies
 19 | ~~~~~~~~~~~~
 20 | 
 21 | -  numpy
 22 | -  scipy
 23 | -  nose
 24 | -  mock
 25 | -  matplotlib
 26 | 
 27 | Using the Code
 28 | ~~~~~~~~~~~~~~
 29 | 
 30 | For a demonstration, find a `.wav` file (for example,
 31 | `Für Elise <http://heeris.id.au/samples/FurElise.wav>`_) and run::
 32 | 
 33 |     python -m gammatone FurElise.wav -d 10
 34 | 
 35 | ...to see a gammatone-gram of the first ten seconds of Beethoven's "Für
 36 | Elise." If you've installed via
 37 | ``pip`` or ``setup.py install``, you should also be able to just run::
 38 | 
 39 |     gammatone FurElise.wav -d 10
 40 | 
 41 | Basis
 42 | ~~~~~
 43 | 
 44 | This project is based on research into how humans perceive audio,
 45 | originally published by Malcolm Slaney:
 46 | 
 47 | `Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report
 48 | #1998-010, Interval Research Corporation,
 49 | 1998. <http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/>`_
 50 | 
 51 | Slaney's report describes a way of modelling how the human ear
 52 | perceives, emphasises and separates different frequencies of sound. A
 53 | series of gammatone filters are constructed whose width increases with
 54 | increasing centre frequency, and this bank of filters is applied to a
 55 | time-domain signal. The result of this is a spectrum that should
 56 | represent the human experience of sound better than, say, a
 57 | Fourier-domain spectrum would.
 58 | 
 59 | A gammatone filter has an impulse response that is a sine wave
 60 | multiplied by a gamma distribution function. It is a common approach to
 61 | modelling the auditory system.
 62 | 
 63 | The gammatone filterbank approach can be considered analogous (but not
 64 | equivalent) to a discrete Fourier transform where the frequency axis is
 65 | logarithmic. For example, a series of notes spaced an octave apart would
 66 | appear to be roughly linearly spaced; or a sound that was distributed
 67 | across the same linear frequency range would appear to have more spread
 68 | at lower frequencies.
 69 | 
 70 | The real goal of this toolkit is to allow easy computation of the
 71 | gammatone equivalent of a spectrogram — a time-varying spectrum of
 72 | energy over audible frequencies based on a gammatone filterbank.
 73 | 
 74 | Slaney demonstrated his research with an initial implementation in
 75 | MATLAB. This implementation was later extended by Dan Ellis, who found a
 76 | way to approximate a "gammatone-gram" by using the fast Fourier
 77 | transform. Ellis' code calculates a matrix of weights that can be
 78 | applied to the output of a FFT so that a Fourier-based spectrogram can
 79 | easily be transformed into such an approximation.
 80 | 
 81 | Ellis' code and documentation is here: `Gammatone-like
 82 | spectrograms <http://labrosa.ee.columbia.edu/matlab/gammatonegram/>`_
 83 | 
 84 | Interest
 85 | ~~~~~~~~
 86 | 
 87 | I became interested in this because of my background in science
 88 | communication and my general interest in the teaching of signal
 89 | processing. I find that the spectrogram approach to visualising signals
 90 | is adequate for illustrating abstract systems or the mathematical
 91 | properties of transforms, but bears little correspondence to a person's
 92 | own experience of sound. If someone wants to see what their favourite
 93 | piece of music "looks like," a normal Fourier transform based
 94 | spectrogram is actually quite a poor way to visualise it. Features of
 95 | the audio seem to be oddly spaced or unnaturally emphasised or
 96 | de-emphasised depending on where they are in the frequency domain.
 97 | 
 98 | The gammatone filterbank approach seems to be closer to what someone
 99 | might intuitively expect a visualisation of sound to look like, and can
100 | help develop an intuition about alternative representations of signals.
101 | 
102 | Verifying the port
103 | ~~~~~~~~~~~~~~~~~~
104 | 
105 | Since this is a port of existing MATLAB code, I've written tests to
106 | verify the Python implementation against the original code. These tests
107 | aren't unit tests, but they do generally test single functions. Running
108 | the tests has the same workflow:
109 | 
110 | 1. Run the scripts in the ``test_generation`` directory. This will
111 |    create a ``.mat`` file containing test data in ``tests/data``.
112 | 
113 | 2. Run ``nosetest3`` in the top level directory. This will find and run
114 |    all the tests in the ``tests`` directory.
115 | 
116 | Although I'm usually loathe to check in generated files to version
117 | control, I'm willing to make an exception for the ``.mat`` files
118 | containing the test data. My reasoning is that they represent the
119 | decoupling of my code from the MATLAB code, and if the two projects were
120 | separated, they would be considered a part of the Python code, not the
121 | original MATLAB code.
122 | 


--------------------------------------------------------------------------------
/gammatone/doc/fftweight.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.fftweight` -- FFT weightings for spectrogram-like gammatone analysis
2 | ====================================================================================
3 | 
4 | .. automodule:: gammatone.fftweight
5 |    :members:
6 | 


--------------------------------------------------------------------------------
/gammatone/doc/filters.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.filters` -- gammatone filterbank construction
2 | =============================================================
3 | 
4 | .. automodule:: gammatone.filters
5 |    :members:
6 | 


--------------------------------------------------------------------------------
/gammatone/doc/gtgram.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.gtgram` -- spectrogram-like gammatone analysis
2 | ==============================================================
3 | 
4 | .. automodule:: gammatone.gtgram
5 |    :members:
6 | 


--------------------------------------------------------------------------------
/gammatone/doc/index.rst:
--------------------------------------------------------------------------------
 1 | .. gammatone documentation master file, created by
 2 |    sphinx-quickstart on Sat Dec  8 23:21:49 2012.
 3 | 
 4 | Index
 5 | =====
 6 | 
 7 | Modules
 8 | -------
 9 | 
10 | .. toctree::
11 |    :maxdepth: 2
12 | 
13 |    filters
14 |    gtgram
15 |    fftweight
16 |    plot
17 | 
18 | .. include:: details.rst
19 |    
20 | Indices and tables
21 | ------------------
22 | 
23 | * :ref:`genindex`
24 | * :ref:`modindex`
25 | * :ref:`search`
26 | 
27 | 


--------------------------------------------------------------------------------
/gammatone/doc/make.bat:
--------------------------------------------------------------------------------
  1 | @ECHO OFF
  2 | 
  3 | REM Command file for Sphinx documentation
  4 | 
  5 | if "%SPHINXBUILD%" == "" (
  6 | 	set SPHINXBUILD=sphinx-build
  7 | )
  8 | set BUILDDIR=_build
  9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
 10 | set I18NSPHINXOPTS=%SPHINXOPTS% .
 11 | if NOT "%PAPER%" == "" (
 12 | 	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
 13 | 	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
 14 | )
 15 | 
 16 | if "%1" == "" goto help
 17 | 
 18 | if "%1" == "help" (
 19 | 	:help
 20 | 	echo.Please use `make ^<target^>` where ^<target^> is one of
 21 | 	echo.  html       to make standalone HTML files
 22 | 	echo.  dirhtml    to make HTML files named index.html in directories
 23 | 	echo.  singlehtml to make a single large HTML file
 24 | 	echo.  pickle     to make pickle files
 25 | 	echo.  json       to make JSON files
 26 | 	echo.  htmlhelp   to make HTML files and a HTML help project
 27 | 	echo.  qthelp     to make HTML files and a qthelp project
 28 | 	echo.  devhelp    to make HTML files and a Devhelp project
 29 | 	echo.  epub       to make an epub
 30 | 	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
 31 | 	echo.  text       to make text files
 32 | 	echo.  man        to make manual pages
 33 | 	echo.  texinfo    to make Texinfo files
 34 | 	echo.  gettext    to make PO message catalogs
 35 | 	echo.  changes    to make an overview over all changed/added/deprecated items
 36 | 	echo.  linkcheck  to check all external links for integrity
 37 | 	echo.  doctest    to run all doctests embedded in the documentation if enabled
 38 | 	goto end
 39 | )
 40 | 
 41 | if "%1" == "clean" (
 42 | 	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
 43 | 	del /q /s %BUILDDIR%\*
 44 | 	goto end
 45 | )
 46 | 
 47 | if "%1" == "html" (
 48 | 	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
 49 | 	if errorlevel 1 exit /b 1
 50 | 	echo.
 51 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
 52 | 	goto end
 53 | )
 54 | 
 55 | if "%1" == "dirhtml" (
 56 | 	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
 57 | 	if errorlevel 1 exit /b 1
 58 | 	echo.
 59 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
 60 | 	goto end
 61 | )
 62 | 
 63 | if "%1" == "singlehtml" (
 64 | 	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
 65 | 	if errorlevel 1 exit /b 1
 66 | 	echo.
 67 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
 68 | 	goto end
 69 | )
 70 | 
 71 | if "%1" == "pickle" (
 72 | 	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
 73 | 	if errorlevel 1 exit /b 1
 74 | 	echo.
 75 | 	echo.Build finished; now you can process the pickle files.
 76 | 	goto end
 77 | )
 78 | 
 79 | if "%1" == "json" (
 80 | 	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
 81 | 	if errorlevel 1 exit /b 1
 82 | 	echo.
 83 | 	echo.Build finished; now you can process the JSON files.
 84 | 	goto end
 85 | )
 86 | 
 87 | if "%1" == "htmlhelp" (
 88 | 	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
 89 | 	if errorlevel 1 exit /b 1
 90 | 	echo.
 91 | 	echo.Build finished; now you can run HTML Help Workshop with the ^
 92 | .hhp project file in %BUILDDIR%/htmlhelp.
 93 | 	goto end
 94 | )
 95 | 
 96 | if "%1" == "qthelp" (
 97 | 	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
 98 | 	if errorlevel 1 exit /b 1
 99 | 	echo.
100 | 	echo.Build finished; now you can run "qcollectiongenerator" with the ^
101 | .qhcp project file in %BUILDDIR%/qthelp, like this:
102 | 	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\gammatone.qhcp
103 | 	echo.To view the help file:
104 | 	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\gammatone.ghc
105 | 	goto end
106 | )
107 | 
108 | if "%1" == "devhelp" (
109 | 	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
110 | 	if errorlevel 1 exit /b 1
111 | 	echo.
112 | 	echo.Build finished.
113 | 	goto end
114 | )
115 | 
116 | if "%1" == "epub" (
117 | 	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
118 | 	if errorlevel 1 exit /b 1
119 | 	echo.
120 | 	echo.Build finished. The epub file is in %BUILDDIR%/epub.
121 | 	goto end
122 | )
123 | 
124 | if "%1" == "latex" (
125 | 	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
126 | 	if errorlevel 1 exit /b 1
127 | 	echo.
128 | 	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
129 | 	goto end
130 | )
131 | 
132 | if "%1" == "text" (
133 | 	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
134 | 	if errorlevel 1 exit /b 1
135 | 	echo.
136 | 	echo.Build finished. The text files are in %BUILDDIR%/text.
137 | 	goto end
138 | )
139 | 
140 | if "%1" == "man" (
141 | 	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
142 | 	if errorlevel 1 exit /b 1
143 | 	echo.
144 | 	echo.Build finished. The manual pages are in %BUILDDIR%/man.
145 | 	goto end
146 | )
147 | 
148 | if "%1" == "texinfo" (
149 | 	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
150 | 	if errorlevel 1 exit /b 1
151 | 	echo.
152 | 	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
153 | 	goto end
154 | )
155 | 
156 | if "%1" == "gettext" (
157 | 	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
158 | 	if errorlevel 1 exit /b 1
159 | 	echo.
160 | 	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
161 | 	goto end
162 | )
163 | 
164 | if "%1" == "changes" (
165 | 	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
166 | 	if errorlevel 1 exit /b 1
167 | 	echo.
168 | 	echo.The overview file is in %BUILDDIR%/changes.
169 | 	goto end
170 | )
171 | 
172 | if "%1" == "linkcheck" (
173 | 	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
174 | 	if errorlevel 1 exit /b 1
175 | 	echo.
176 | 	echo.Link check complete; look for any errors in the above output ^
177 | or in %BUILDDIR%/linkcheck/output.txt.
178 | 	goto end
179 | )
180 | 
181 | if "%1" == "doctest" (
182 | 	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
183 | 	if errorlevel 1 exit /b 1
184 | 	echo.
185 | 	echo.Testing of doctests in the sources finished, look at the ^
186 | results in %BUILDDIR%/doctest/output.txt.
187 | 	goto end
188 | )
189 | 
190 | :end
191 | 


--------------------------------------------------------------------------------
/gammatone/doc/plot.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.plot` -- Plotting utilities for gammatone analysis
2 | ==================================================================
3 | 
4 | .. automodule:: gammatone.plot
5 |    :members:
6 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/__init__.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | # 
 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | 
 6 | # Designate gammatone module
 7 | """
 8 | Gammatone filterbank toolkit
 9 | """
10 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/__main__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | # 
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | from gammatone.plot import main
6 | main()
7 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/fftweight.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  2 | # 
  3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
  4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  5 | """
  6 | This module contains functions for calculating weights to approximate a
  7 | gammatone filterbank-like "spectrogram" from a Fourier transform.
  8 | """
  9 | from __future__ import division
 10 | import numpy as np
 11 | 
 12 | import gammatone.filters as filters
 13 | import gammatone.gtgram as gtgram
 14 | 
 15 | def specgram_window(
 16 |         nfft,
 17 |         nwin,
 18 |     ):
 19 |     """
 20 |     Window calculation used in specgram replacement function. Hann window of
 21 |     width `nwin` centred in an array of width `nfft`.
 22 |     """
 23 |     halflen = nwin // 2
 24 |     halff = nfft // 2 # midpoint of win
 25 |     acthalflen = int(np.floor(min(halff, halflen)))
 26 |     halfwin = 0.5 * ( 1 + np.cos(np.pi * np.arange(0, halflen+1)/halflen))
 27 |     win = np.zeros((nfft,))
 28 |     win[halff:halff+acthalflen] = halfwin[0:acthalflen];
 29 |     win[halff:halff-acthalflen:-1] = halfwin[0:acthalflen];
 30 |     return win
 31 | 
 32 | 
 33 | def specgram(x, n, sr, w, h):
 34 |     """ Substitute for Matlab's specgram, calculates a simple spectrogram.
 35 | 
 36 |     :param x: The signal to analyse
 37 |     :param n: The FFT length
 38 |     :param sr: The sampling rate
 39 |     :param w: The window length (see :func:`specgram_window`)
 40 |     :param h: The hop size (must be greater than zero)
 41 |     """
 42 |     # Based on Dan Ellis' myspecgram.m,v 1.1 2002/08/04
 43 |     assert h > 0, "Must have a hop size greater than 0"
 44 | 
 45 |     s = x.shape[0]
 46 |     win = specgram_window(n, w)
 47 | 
 48 |     c = 0
 49 | 
 50 |     # pre-allocate output array
 51 |     ncols = 1 + int(np.floor((s - n)/h))
 52 |     d = np.zeros(((1 + n // 2), ncols), np.dtype(complex))
 53 | 
 54 |     for b in range(0, s - n, h):
 55 |       u = win * x[b : b + n]
 56 |       t = np.fft.fft(u)
 57 |       d[:, c] = t[0 : (1 + n // 2)].T
 58 |       c = c + 1
 59 | 
 60 |     return d
 61 | 
 62 | 
 63 | def fft_weights(
 64 |     nfft,
 65 |     fs,
 66 |     nfilts,
 67 |     width,
 68 |     fmin,
 69 |     fmax,
 70 |     maxlen):
 71 |     """
 72 |     :param nfft: the source FFT size
 73 |     :param sr: sampling rate (Hz)
 74 |     :param nfilts: the number of output bands required (default 64)
 75 |     :param width: the constant width of each band in Bark (default 1)
 76 |     :param fmin: lower limit of frequencies (Hz)
 77 |     :param fmax: upper limit of frequencies (Hz)
 78 |     :param maxlen: number of bins to truncate the rows to
 79 |     
 80 |     :return: a tuple `weights`, `gain` with the calculated weight matrices and
 81 |              gain vectors
 82 |     
 83 |     Generate a matrix of weights to combine FFT bins into Gammatone bins.
 84 |     
 85 |     Note about `maxlen` parameter: While wts has nfft columns, the second half
 86 |     are all zero. Hence, aud spectrum is::
 87 |     
 88 |         fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft))
 89 |     
 90 |     `maxlen` truncates the rows to this many bins.
 91 |     
 92 |     | (c) 2004-2009 Dan Ellis dpwe@ee.columbia.edu  based on rastamat/audspec.m
 93 |     | (c) 2012 Jason Heeris (Python implementation)
 94 |     """
 95 |     ucirc = np.exp(1j * 2 * np.pi * np.arange(0, nfft / 2 + 1) / nfft)[None, ...]
 96 |     
 97 |     # Common ERB filter code factored out
 98 |     cf_array = filters.erb_space(fmin, fmax, nfilts)[::-1]
 99 | 
100 |     _, A11, A12, A13, A14, _, _, _, B2, gain = (
101 |         filters.make_erb_filters(fs, cf_array, width).T
102 |     )
103 |     
104 |     A11, A12, A13, A14 = A11[..., None], A12[..., None], A13[..., None], A14[..., None]
105 | 
106 |     r = np.sqrt(B2)
107 |     theta = 2 * np.pi * cf_array / fs    
108 |     pole = (r * np.exp(1j * theta))[..., None]
109 |     
110 |     GTord = 4
111 |     
112 |     weights = np.zeros((nfilts, nfft))
113 | 
114 |     weights[:, 0:ucirc.shape[1]] = (
115 |           np.abs(ucirc + A11 * fs) * np.abs(ucirc + A12 * fs)
116 |         * np.abs(ucirc + A13 * fs) * np.abs(ucirc + A14 * fs)
117 |         * np.abs(fs * (pole - ucirc) * (pole.conj() - ucirc)) ** (-GTord)
118 |         / gain[..., None]
119 |     )
120 | 
121 |     weights = weights[:, 0:int(maxlen)]
122 | 
123 |     return weights, gain
124 | 
125 | 
126 | def fft_gtgram(
127 |     wave,
128 |     fs,
129 |     window_time, hop_time,
130 |     channels,
131 |     f_min):
132 |     """
133 |     Calculate a spectrogram-like time frequency magnitude array based on
134 |     an FFT-based approximation to gammatone subband filters.
135 | 
136 |     A matrix of weightings is calculated (using :func:`gtgram.fft_weights`), and
137 |     applied to the FFT of the input signal (``wave``, using sample rate ``fs``).
138 |     The result is an approximation of full filtering using an ERB gammatone
139 |     filterbank (as per :func:`gtgram.gtgram`).
140 | 
141 |     ``f_min`` determines the frequency cutoff for the corresponding gammatone
142 |     filterbank. ``window_time`` and ``hop_time`` (both in seconds) are the size
143 |     and overlap of the spectrogram columns.
144 | 
145 |     | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu
146 |     |
147 |     | (c) 2013 Jason Heeris (Python implementation)
148 |     """
149 |     width = 1 # Was a parameter in the MATLAB code
150 | 
151 |     nfft = int(2 ** (np.ceil(np.log2(2 * window_time * fs))))
152 |     nwin, nhop, _ = gtgram.gtgram_strides(fs, window_time, hop_time, 0);
153 | 
154 |     gt_weights, _ = fft_weights(
155 |             nfft,
156 |             fs,
157 |             channels,
158 |             width,
159 |             f_min,
160 |             fs / 2,
161 |             nfft / 2 + 1
162 |         )
163 | 
164 |     sgram = specgram(wave, nfft, fs, nwin, nhop)
165 | 
166 |     result = gt_weights.dot(np.abs(sgram)) / nfft
167 | 
168 |     return result
169 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/filters.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  2 | # 
  3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
  4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  5 | """
  6 | This module contains functions for constructing sets of equivalent rectangular
  7 | bandwidth gammatone filters.
  8 | """
  9 | from __future__ import division
 10 | from collections import namedtuple
 11 | 
 12 | import numpy as np
 13 | import scipy as sp
 14 | from scipy import signal as sgn
 15 | 
 16 | DEFAULT_FILTER_NUM = 100
 17 | DEFAULT_LOW_FREQ = 100
 18 | DEFAULT_HIGH_FREQ = 44100 / 4
 19 | 
 20 | 
 21 | def erb_point(low_freq, high_freq, fraction):
 22 |     """
 23 |     Calculates a single point on an ERB scale between ``low_freq`` and
 24 |     ``high_freq``, determined by ``fraction``. When ``fraction`` is ``1``,
 25 |     ``low_freq`` will be returned. When ``fraction`` is ``0``, ``high_freq``
 26 |     will be returned.
 27 |     
 28 |     ``fraction`` can actually be outside the range ``[0, 1]``, which in general
 29 |     isn't very meaningful, but might be useful when ``fraction`` is rounded a
 30 |     little above or below ``[0, 1]`` (eg. for plot axis labels).
 31 |     """
 32 |     # Change the following three parameters if you wish to use a different ERB
 33 |     # scale. Must change in MakeERBCoeffs too.
 34 |     # TODO: Factor these parameters out
 35 |     ear_q = 9.26449 # Glasberg and Moore Parameters
 36 |     min_bw = 24.7
 37 |     order = 1
 38 | 
 39 |     # All of the following expressions are derived in Apple TR #35, "An
 40 |     # Efficient Implementation of the Patterson-Holdsworth Cochlear Filter
 41 |     # Bank." See pages 33-34.
 42 |     erb_point = (
 43 |         -ear_q * min_bw
 44 |         + np.exp(
 45 |             fraction * (
 46 |                 -np.log(high_freq + ear_q * min_bw)
 47 |                 + np.log(low_freq + ear_q * min_bw)
 48 |                 )
 49 |         ) *
 50 |         (high_freq + ear_q * min_bw)
 51 |     )
 52 |     
 53 |     return erb_point
 54 | 
 55 | 
 56 | def erb_space(
 57 |     low_freq=DEFAULT_LOW_FREQ,
 58 |     high_freq=DEFAULT_HIGH_FREQ,
 59 |     num=DEFAULT_FILTER_NUM):
 60 |     """
 61 |     This function computes an array of ``num`` frequencies uniformly spaced
 62 |     between ``high_freq`` and ``low_freq`` on an ERB scale.
 63 |     
 64 |     For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
 65 |     "Suggested formulae for calculating auditory-filter bandwidths and
 66 |     excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
 67 |     """
 68 |     return erb_point(
 69 |         low_freq,
 70 |         high_freq,
 71 |         np.arange(1, num + 1) / num
 72 |         )
 73 | 
 74 | 
 75 | def centre_freqs(fs, num_freqs, cutoff):
 76 |     """
 77 |     Calculates an array of centre frequencies (for :func:`make_erb_filters`)
 78 |     from a sampling frequency, lower cutoff frequency and the desired number of
 79 |     filters.
 80 |     
 81 |     :param fs: sampling rate
 82 |     :param num_freqs: number of centre frequencies to calculate
 83 |     :type num_freqs: int
 84 |     :param cutoff: lower cutoff frequency
 85 |     :return: same as :func:`erb_space`
 86 |     """
 87 |     return erb_space(cutoff, fs / 2, num_freqs)
 88 | 
 89 | 
 90 | def make_erb_filters(fs, centre_freqs, width=1.0):
 91 |     """
 92 |     This function computes the filter coefficients for a bank of 
 93 |     Gammatone filters. These filters were defined by Patterson and Holdworth for
 94 |     simulating the cochlea. 
 95 |     
 96 |     The result is returned as a :class:`ERBCoeffArray`. Each row of the
 97 |     filter arrays contains the coefficients for four second order filters. The
 98 |     transfer function for these four filters share the same denominator (poles)
 99 |     but have different numerators (zeros). All of these coefficients are
100 |     assembled into one vector that the ERBFilterBank can take apart to implement
101 |     the filter.
102 |     
103 |     The filter bank contains "numChannels" channels that extend from
104 |     half the sampling rate (fs) to "lowFreq". Alternatively, if the numChannels
105 |     input argument is a vector, then the values of this vector are taken to be
106 |     the center frequency of each desired filter. (The lowFreq argument is
107 |     ignored in this case.)
108 |     
109 |     Note this implementation fixes a problem in the original code by
110 |     computing four separate second order filters. This avoids a big problem with
111 |     round off errors in cases of very small cfs (100Hz) and large sample rates
112 |     (44kHz). The problem is caused by roundoff error when a number of poles are
113 |     combined, all very close to the unit circle. Small errors in the eigth order
114 |     coefficient, are multiplied when the eigth root is taken to give the pole
115 |     location. These small errors lead to poles outside the unit circle and
116 |     instability. Thanks to Julius Smith for leading me to the proper
117 |     explanation.
118 |     
119 |     Execute the following code to evaluate the frequency response of a 10
120 |     channel filterbank::
121 |     
122 |         fcoefs = MakeERBFilters(16000,10,100);
123 |         y = ERBFilterBank([1 zeros(1,511)], fcoefs);
124 |         resp = 20*log10(abs(fft(y')));
125 |         freqScale = (0:511)/512*16000;
126 |         semilogx(freqScale(1:255),resp(1:255,:));
127 |         axis([100 16000 -60 0])
128 |         xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)');
129 |     
130 |     | Rewritten by Malcolm Slaney@Interval.  June 11, 1998.
131 |     | (c) 1998 Interval Research Corporation
132 |     |
133 |     | (c) 2012 Jason Heeris (Python implementation)
134 |     """
135 |     T = 1 / fs
136 |     # Change the followFreqing three parameters if you wish to use a different
137 |     # ERB scale. Must change in ERBSpace too.
138 |     # TODO: factor these out
139 |     ear_q = 9.26449 # Glasberg and Moore Parameters
140 |     min_bw = 24.7
141 |     order = 1
142 | 
143 |     erb = width*((centre_freqs / ear_q) ** order + min_bw ** order) ** ( 1 /order)
144 |     B = 1.019 * 2 * np.pi * erb
145 | 
146 |     arg = 2 * centre_freqs * np.pi * T
147 |     vec = np.exp(2j * arg)
148 | 
149 |     A0 = T
150 |     A2 = 0
151 |     B0 = 1
152 |     B1 = -2 * np.cos(arg) / np.exp(B * T)
153 |     B2 = np.exp(-2 * B * T)
154 |     
155 |     rt_pos = np.sqrt(3 + 2 ** 1.5)
156 |     rt_neg = np.sqrt(3 - 2 ** 1.5)
157 |     
158 |     common = -T * np.exp(-(B * T))
159 |     
160 |     # TODO: This could be simplified to a matrix calculation involving the
161 |     # constant first term and the alternating rt_pos/rt_neg and +/-1 second
162 |     # terms
163 |     k11 = np.cos(arg) + rt_pos * np.sin(arg)
164 |     k12 = np.cos(arg) - rt_pos * np.sin(arg)
165 |     k13 = np.cos(arg) + rt_neg * np.sin(arg)
166 |     k14 = np.cos(arg) - rt_neg * np.sin(arg)
167 | 
168 |     A11 = common * k11
169 |     A12 = common * k12
170 |     A13 = common * k13
171 |     A14 = common * k14
172 | 
173 |     gain_arg = np.exp(1j * arg - B * T)
174 | 
175 |     gain = np.abs(
176 |             (vec - gain_arg * k11)
177 |           * (vec - gain_arg * k12)
178 |           * (vec - gain_arg * k13)
179 |           * (vec - gain_arg * k14)
180 |           * (  T * np.exp(B * T)
181 |              / (-1 / np.exp(B * T) + 1 + vec * (1 - np.exp(B * T)))
182 |             )**4
183 |         )
184 | 
185 |     allfilts = np.ones_like(centre_freqs)
186 |     
187 |     fcoefs = np.column_stack([
188 |         A0 * allfilts, A11, A12, A13, A14, A2*allfilts,
189 |         B0 * allfilts, B1, B2,
190 |         gain
191 |     ])
192 |     
193 |     return fcoefs
194 | 
195 | 
196 | def erb_filterbank(wave, coefs):
197 |     """
198 |     :param wave: input data (one dimensional sequence)
199 |     :param coefs: gammatone filter coefficients
200 |     
201 |     Process an input waveform with a gammatone filter bank. This function takes
202 |     a single sound vector, and returns an array of filter outputs, one channel
203 |     per row.
204 |     
205 |     The fcoefs parameter, which completely specifies the Gammatone filterbank,
206 |     should be designed with the :func:`make_erb_filters` function.
207 |     
208 |     | Malcolm Slaney @ Interval, June 11, 1998.
209 |     | (c) 1998 Interval Research Corporation
210 |     | Thanks to Alain de Cheveigne' for his suggestions and improvements.
211 |     |
212 |     | (c) 2013 Jason Heeris (Python implementation)
213 |     """
214 |     output = np.zeros((coefs[:,9].shape[0], wave.shape[0]))
215 |     
216 |     gain = coefs[:, 9]
217 |     # A0, A11, A2
218 |     As1 = coefs[:, (0, 1, 5)]
219 |     # A0, A12, A2
220 |     As2 = coefs[:, (0, 2, 5)]
221 |     # A0, A13, A2
222 |     As3 = coefs[:, (0, 3, 5)]
223 |     # A0, A14, A2
224 |     As4 = coefs[:, (0, 4, 5)]
225 |     # B0, B1, B2
226 |     Bs = coefs[:, 6:9]
227 |     
228 |     # Loop over channels
229 |     for idx in range(0, coefs.shape[0]):
230 |         # These seem to be reversed (in the sense of A/B order), but that's what
231 |         # the original code did...
232 |         # Replacing these with polynomial multiplications reduces both accuracy
233 |         # and speed.
234 |         y1 = sgn.lfilter(As1[idx], Bs[idx], wave)
235 |         y2 = sgn.lfilter(As2[idx], Bs[idx], y1)
236 |         y3 = sgn.lfilter(As3[idx], Bs[idx], y2)
237 |         y4 = sgn.lfilter(As4[idx], Bs[idx], y3)
238 |         output[idx, :] = y4 / gain[idx]
239 |         
240 |     return output
241 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/gtgram.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | # 
 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | from __future__ import division
 6 | import numpy as np
 7 | 
 8 | from .filters import make_erb_filters, centre_freqs, erb_filterbank
 9 | 
10 | """
11 | This module contains functions for rendering "spectrograms" which use gammatone
12 | filterbanks instead of Fourier transforms.
13 | """
14 | 
15 | def round_half_away_from_zero(num):
16 |     """ Implement the round-half-away-from-zero rule, where fractional parts of
17 |     0.5 result in rounding up to the nearest positive integer for positive
18 |     numbers, and down to the nearest negative number for negative integers.
19 |     """
20 |     return np.sign(num) * np.floor(np.abs(num) + 0.5)
21 | 
22 | 
23 | def gtgram_strides(fs, window_time, hop_time, filterbank_cols):
24 |     """
25 |     Calculates the window size for a gammatonegram.
26 |     
27 |     @return a tuple of (window_size, hop_samples, output_columns)
28 |     """
29 |     nwin        = int(round_half_away_from_zero(window_time * fs))
30 |     hop_samples = int(round_half_away_from_zero(hop_time * fs))
31 |     columns     = (1
32 |                     + int(
33 |                         np.floor(
34 |                             (filterbank_cols - nwin)
35 |                             / hop_samples
36 |                         )
37 |                     )
38 |                   )
39 |         
40 |     return (nwin, hop_samples, columns)
41 | 
42 | 
43 | def gtgram_xe(wave, fs, channels, f_min):
44 |     """ Calculate the intermediate ERB filterbank processed matrix """
45 |     cfs = centre_freqs(fs, channels, f_min)
46 |     fcoefs = np.flipud(make_erb_filters(fs, cfs))
47 |     xf = erb_filterbank(wave, fcoefs)
48 |     xe = np.power(xf, 2)
49 |     return xe
50 | 
51 | 
52 | def gtgram(
53 |     wave,
54 |     fs,
55 |     window_time, hop_time,
56 |     channels,
57 |     f_min):
58 |     """
59 |     Calculate a spectrogram-like time frequency magnitude array based on
60 |     gammatone subband filters. The waveform ``wave`` (at sample rate ``fs``) is
61 |     passed through an multi-channel gammatone auditory model filterbank, with
62 |     lowest frequency ``f_min`` and highest frequency ``f_max``. The outputs of
63 |     each band then have their energy integrated over windows of ``window_time``
64 |     seconds, advancing by ``hop_time`` secs for successive columns. These
65 |     magnitudes are returned as a nonnegative real matrix with ``channels`` rows.
66 |     
67 |     | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu
68 |     |
69 |     | (c) 2013 Jason Heeris (Python implementation)
70 |     """
71 |     xe = gtgram_xe(wave, fs, channels, f_min)    
72 |     
73 |     nwin, hop_samples, ncols = gtgram_strides(
74 |         fs,
75 |         window_time,
76 |         hop_time,
77 |         xe.shape[1]
78 |     )
79 |     
80 |     y = np.zeros((channels, ncols))
81 |     
82 |     for cnum in range(ncols):
83 |         segment = xe[:, cnum * hop_samples + np.arange(nwin)]
84 |         y[:, cnum] = np.sqrt(segment.mean(1))
85 |     
86 |     return y
87 | 


--------------------------------------------------------------------------------
/gammatone/gammatone/plot.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  2 | #
  3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
  4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  5 | """
  6 | Plotting utilities related to gammatone analysis, primarily for use with
  7 | ``matplotlib``.
  8 | """
  9 | from __future__ import division
 10 | import argparse
 11 | import os.path
 12 | 
 13 | import matplotlib.pyplot
 14 | import matplotlib.ticker
 15 | import numpy as np
 16 | import scipy.constants
 17 | import scipy.io.wavfile
 18 | 
 19 | from .filters import erb_point
 20 | import gammatone.gtgram
 21 | import gammatone.fftweight
 22 | 
 23 | 
 24 | class ERBFormatter(matplotlib.ticker.EngFormatter):
 25 |     """
 26 |     Axis formatter for gammatone filterbank analysis. This formatter calculates
 27 |     the ERB spaced frequencies used for analysis, and renders them similarly to
 28 |     the engineering axis formatter.
 29 | 
 30 |     The scale is changed so that `[0, 1]` corresponds to ERB spaced frequencies
 31 |     from ``high_freq`` to ``low_freq`` (note the reversal). It should be used
 32 |     with ``imshow`` where the ``extent`` argument is ``[a, b, 1, 0]`` (again,
 33 |     note the inversion).
 34 |     """
 35 | 
 36 |     def __init__(self, low_freq, high_freq, *args, **kwargs):
 37 |         """
 38 |         Creates a new :class ERBFormatter: for use with ``matplotlib`` plots.
 39 |         Note that this class does not supply the ``units`` or ``places``
 40 |         arguments; typically these would be ``'Hz'`` and ``0``.
 41 | 
 42 |         :param low_freq: the low end of the gammatone filterbank frequency range
 43 |         :param high_freq: the high end of the gammatone filterbank frequency
 44 |           range
 45 |         """
 46 |         self.low_freq = low_freq
 47 |         self.high_freq = high_freq
 48 |         super().__init__(*args, **kwargs)
 49 | 
 50 |     def _erb_axis_scale(self, fraction):
 51 |         return erb_point(self.low_freq, self.high_freq, fraction)
 52 | 
 53 |     def __call__(self, val, pos=None):
 54 |         newval = self._erb_axis_scale(val)
 55 |         return super().__call__(newval, pos)
 56 | 
 57 | 
 58 | def gtgram_plot(
 59 |         gtgram_function,
 60 |         axes, x, fs,
 61 |         window_time, hop_time, channels, f_min,
 62 |         imshow_args=None
 63 |         ):
 64 |     """
 65 |     Plots a spectrogram-like time frequency magnitude array based on gammatone
 66 |     subband filters.
 67 | 
 68 |     :param gtgram_function: A function with signature::
 69 | 
 70 |         fft_gtgram(
 71 |             wave,
 72 |             fs,
 73 |             window_time, hop_time,
 74 |             channels,
 75 |             f_min)
 76 | 
 77 |     See :func:`gammatone.gtgram.gtgram` for details of the paramters.
 78 |     """
 79 |     # Set a nice formatter for the y-axis
 80 |     formatter = ERBFormatter(f_min, fs/2, unit='Hz', places=0)
 81 |     axes.yaxis.set_major_formatter(formatter)
 82 | 
 83 |     # Figure out time axis scaling
 84 |     duration = len(x) / fs
 85 | 
 86 |     # Calculate 1:1 aspect ratio
 87 |     aspect_ratio = duration/scipy.constants.golden
 88 | 
 89 |     gtg = gtgram_function(x, fs, window_time, hop_time, channels, f_min)
 90 |     Z = np.flipud(20 * np.log10(gtg))
 91 | 
 92 |     img = axes.imshow(Z, extent=[0, duration, 1, 0], aspect=aspect_ratio)
 93 | 
 94 | 
 95 | # Entry point for CLI script
 96 | 
 97 | HELP_TEXT = """\
 98 | Plots the gammatone filterbank analysis of a WAV file.
 99 | 
100 | If the file contains more than one channel, all channels are averaged before
101 | performing analysis.
102 | """
103 | 
104 | 
105 | def render_audio_from_file(path, duration, function):
106 |     """
107 |     Renders the given ``duration`` of audio from the audio file at ``path``
108 |     using the gammatone spectrogram function ``function``.
109 |     """
110 |     samplerate, data = scipy.io.wavfile.read(path)
111 | 
112 |     # Average the stereo signal
113 |     if duration:
114 |         nframes = duration * samplerate
115 |         data = data[0 : nframes, :]
116 | 
117 |     signal = data.mean(1)
118 | 
119 |     # Default gammatone-based spectrogram parameters
120 |     twin = 0.08
121 |     thop = twin / 2
122 |     channels = 1024
123 |     fmin = 20
124 | 
125 |     # Set up the plot
126 |     fig = matplotlib.pyplot.figure()
127 |     axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])
128 | 
129 |     gtgram_plot(
130 |         function,
131 |         axes,
132 |         signal,
133 |         samplerate,
134 |         twin, thop, channels, fmin)
135 | 
136 |     axes.set_title(os.path.basename(path))
137 |     axes.set_xlabel("Time (s)")
138 |     axes.set_ylabel("Frequency")
139 | 
140 |     matplotlib.pyplot.show()
141 | 
142 | 
143 | def main():
144 |     """
145 |     Entry point for CLI application to plot gammatonegrams of sound files.
146 |     """
147 |     parser = argparse.ArgumentParser(description=HELP_TEXT)
148 | 
149 |     parser.add_argument(
150 |         'sound_file',
151 |         help="The sound file to graph. See the help text for supported formats.")
152 | 
153 |     parser.add_argument(
154 |         '-d', '--duration', type=int,
155 |         help="The time in seconds from the start of the audio to use for the "
156 |              "graph (default is to use the whole file)."
157 |         )
158 | 
159 |     parser.add_argument(
160 |         '-a', '--accurate', action='store_const', dest='function',
161 |         const=gammatone.gtgram.gtgram, default=gammatone.fftweight.fft_gtgram,
162 |         help="Use the full filterbank approach instead of the weighted FFT "
163 |              "approximation. This is much slower, and uses a lot of memory, but"
164 |              " is more accurate."
165 |         )
166 | 
167 |     args = parser.parse_args()
168 | 
169 |     return render_audio_from_file(args.sound_file, args.duration, args.function)
170 | 


--------------------------------------------------------------------------------
/gammatone/setup.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | #
 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | from setuptools import setup, find_packages
 6 | 
 7 | setup(
 8 |     name = "Gammatone",
 9 |     version = "1.0",
10 |     packages = find_packages(),
11 | 
12 |     install_requires = [
13 |         'numpy',
14 |         'scipy',
15 |         'nose',
16 |         'mock',
17 |         'matplotlib',
18 |     ],
19 | 
20 |     entry_points = {
21 |         'console_scripts': [
22 |             'gammatone = gammatone.plot:main',
23 |         ]
24 |     }
25 | )
26 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/README:
--------------------------------------------------------------------------------
1 | These are Octave/MATLAB scripts that create test data for the Python
2 | implementation of that gammatone library.
3 | 
4 | You must add both this directory and the top level 'auditory_toolkit' directory
5 | to your search path.
6 | 
7 | The scripts are designed to run under MATLAB and Octave (using '--traditional').
8 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_ERBFilterBank.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_ERBFilterBank()
 6 | 
 7 |     erb_space_inputs = { ...
 8 |         100, 11025,  10, sin(2*pi*220*[0:22050/100]'/22050); ...
 9 |          20, 22050,  10, square(2*pi*150*[0:44100/200]'/44100); ...
10 |          20, 44100,  40, square(2*pi*12000*[0:88200/400]'/88200); ...
11 |         100, 11025, 1000, sawtooth(2*pi*10100*[0:22050/100]'/22050, 0.5); ...
12 |         500, 80000,  200, sawtooth(2*pi*3333*[0:160000/400]'/160000, 0.5); ...
13 |     };
14 |     
15 |     erb_filter_inputs = { ...
16 |         44100, [22050; 2205; 220], square(2*pi*220*[0:44100/200]'/44100); ...
17 |         16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000], square(2*pi*2000*[0:16000/50]'/16000); ...
18 |         16000, [16000; 8000; 1], square(2*pi*880*[0:16000/50]'/16000); ...
19 |     };
20 |     
21 |     num_tests = size(erb_space_inputs)(1) ...
22 |                 + size(erb_filter_inputs)(1);
23 |     
24 |     erb_filterbank_inputs = {};
25 |     
26 |     erb_filterbank_results = {};
27 |     
28 |     % This will ONLY generate tests that use the centre frequency inputs
29 |     
30 |     % ERBSpace generated inputs
31 |     for tnum=1:size(erb_space_inputs)(1)
32 |         [f_low, f_high, num_f, wave] = deal(erb_space_inputs{tnum,:});
33 |         fs = f_high*2;
34 |         f_arr = ERBSpace(f_low, f_high, num_f);
35 |         fcoefs = MakeERBFilters(fs, f_arr, 0);
36 |         erb_filterbank_inputs(tnum, :) = {fcoefs, wave};
37 |     end
38 |     
39 |     % MakeERBFilters generated inputs
40 |     for tnum=1:size(erb_filter_inputs)
41 |         [fs, f_arr, wave] = deal(erb_filter_inputs{tnum,:});
42 |         fcoefs = MakeERBFilters(fs, f_arr, 0);
43 |         offset = size(erb_space_inputs)(1);
44 |         erb_filterbank_inputs(offset+tnum, :) = {fcoefs, wave};
45 |     end
46 |     
47 |     for tnum=1:num_tests
48 |         fcoefs = erb_filterbank_inputs{tnum, 1};
49 |         wave = erb_filterbank_inputs{tnum, 2};
50 |         erb_filterbank_results(tnum, :) = ERBFilterBank(wave, fcoefs);
51 |     end
52 | 
53 |     results_file = fullfile('..', 'tests', 'data', 'test_filterbank_data.mat');
54 |     save(results_file, 'erb_filterbank_inputs', 'erb_filterbank_results');
55 | end
56 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_ERBSpace.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_ERBSpace()
 6 |     
 7 |     % Low freq, high freq, N
 8 |     erbspace_inputs = { ...
 9 |         100, 11025,  100; ...
10 |         100, 22050,  100; ...
11 |          20, 22050,  100; ...
12 |          20, 44100,  100; ...
13 |         100, 11025,   10; ...
14 |         100, 11025, 1000; ...
15 |         500, 80000,  200; ...
16 |     };
17 |     
18 |     erbspace_results = {};
19 |     
20 |     num_tests = size(erbspace_inputs)(1);
21 |     
22 |     for tnum=1:num_tests
23 |         [f_low, f_high, num_f] = deal(erbspace_inputs{tnum,:});
24 |         erbspace_results(tnum, :) = ERBSpace(f_low, f_high, num_f);
25 |     end
26 |     
27 |     results_file = fullfile('..', 'tests', 'data', 'test_erbspace_data.mat');
28 |     save(results_file, 'erbspace_inputs', 'erbspace_results');
29 | end
30 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_MakeERBFilters.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_MakeERBFilters()
 6 |     
 7 |     erb_space_inputs = { ...
 8 |         100, 11025,  100; ...
 9 |         100, 22050,  100; ...
10 |          20, 22050,  100; ...
11 |          20, 44100,  100; ...
12 |         100, 11025,   10; ...
13 |         100, 11025, 1000; ...
14 |         500, 80000,  200; ...
15 |     };
16 |     
17 |     extra_inputs = { ...
18 |         44100, [22050; 2205; 220]; ...
19 |         16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000]; ...
20 |         16000, [16000; 8000; 1]; ...
21 |     };
22 |      
23 |     num_tests = size(erb_space_inputs)(1) + size(extra_inputs)(1);
24 |     
25 |     erb_filter_inputs = {};
26 |     
27 |     erb_filter_results = {};
28 |     
29 |     % This will ONLY generate tests that use the centre frequency inputs
30 |     
31 |     % ERBSpace generated inputs
32 |     for tnum=1:size(erb_space_inputs)(1)
33 |         [f_low, f_high, num_f] = deal(erb_space_inputs{tnum,:});
34 |         fs = f_high*2;
35 |         cfs = ERBSpace(f_low, f_high, num_f);
36 |         erb_filter_inputs(tnum, :) = {fs, cfs};
37 |     end
38 |     
39 |     erb_filter_inputs = cat(1, erb_filter_inputs, extra_inputs);
40 |     
41 |     for tnum=1:num_tests
42 |         fs = erb_filter_inputs{tnum, 1};
43 |         cfs = erb_filter_inputs{tnum, 2};
44 |         fcoefs = MakeERBFilters(fs, cfs, 0);
45 |         erb_filter_results(tnum, :) = fcoefs;
46 |     end
47 | 
48 |     results_file = fullfile('..', 'tests', 'data', 'test_erb_filter_data.mat');
49 |     save(results_file, 'erb_filter_inputs', 'erb_filter_results');
50 | end
51 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_fft2gammatonemx.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_fft2gtmx()
 6 |     % Arguments:
 7 |     % nfft, sr, nfilts, width, minfreq, maxfreq, maxlen
 8 |     
 9 |     fft2gtmx_inputs = { ...
10 |         256 , 48000, 64 , 1   , 100, 48000/2 , 256; ...
11 |         % Vary the width parameter
12 |         256 , 48000, 64 , 2   , 100, 48000/2 , 256; ...
13 |         256 , 48000, 64 , 4   , 100, 48000/2 , 256; ...
14 |         256 , 48000, 64 , 0.25, 100, 48000/2 , 256; ...
15 |         % Vary sampling rate
16 |         256 , 96000, 64 , 1   , 100, 96000/2 , 256; ...
17 |         % Vary upper frequency
18 |         256 , 48000, 64 , 1   , 100, 48000/2 , 256; ...
19 |         256 , 48000, 64 , 1   , 100, 48000/4 , 256; ...
20 |         256 , 48000, 64 , 1   , 100, 48000/10, 256; ...
21 |         % Vary maxlen
22 |         256 , 48000, 64 , 1   , 100, 48000/2 , 128; ...
23 |         256 , 48000, 64 , 1   , 100, 48000/2 , 16; ...
24 |         256 , 48000, 64 , 1   , 100, 48000/2 , 99; ...
25 |         % Vary sampling rate
26 |         1024, 48000, 128, 1   , 100, 48000/2 , 512; ...
27 |         1024, 48000, 128, 1   , 100, 48000/2 , 128; ...
28 |         64  , 44100, 32 , 1   , 20 , 44100/2 , 64; ...
29 |     };
30 |     
31 |     fft2gtmx_results = {};
32 |     
33 |     for tnum=1:size(fft2gtmx_inputs)(1)
34 |         [nfft, sr, nfilts, width, minfreq, maxfreq, maxlen] = deal(fft2gtmx_inputs{tnum,:});
35 |         [wts, gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen);
36 |         fft2gtmx_results(tnum, :) = {wts, gain};
37 |     end
38 |     
39 |     results_file = fullfile('..', 'tests', 'data', 'test_fft2gtmx_data.mat');
40 |     save(results_file, 'fft2gtmx_inputs', 'fft2gtmx_results');
41 | end
42 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_fft_gammatonegram.m:
--------------------------------------------------------------------------------
  1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  2 | % 
  3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
  4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  5 | function test_fft_gammatonegram()
  6 |     % Need:
  7 |     %  wave
  8 |     %  fs
  9 |     %  window_time
 10 |     %  hop_time
 11 |     %  channels
 12 |     %  f_min
 13 |     %  f_max
 14 |     
 15 |     % Need to mock out:
 16 |     %  make_erb_filters output (elide)
 17 |     %  centre_freqs (elide)
 18 |     %  erb_filterbank (depends on X, SR, N, FMIN)
 19 |     
 20 |     % Ensure reproducible tests
 21 |     rand('state', [3 1 4 1 5 9 2 7]);
 22 |     
 23 |     fft_gammatonegram_inputs = {
 24 |         'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ...
 25 |         'sin220_01'  , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ...
 26 |         'sin220_02'  , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ...
 27 |         'rand_01'    , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ...
 28 |         'rand_02'    , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ...
 29 |         'rand_03'    , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ...
 30 |     };
 31 |     
 32 |     % Mocked intermediate results for unit testing
 33 |     fft_gammatonegram_mocks = {};
 34 |     
 35 |     % Actual results
 36 |     fft_gammatonegram_results = {};
 37 |     
 38 |     for tnum=1:size(fft_gammatonegram_inputs)(1)
 39 |         [name, wave, fs, twin, thop, chs, fmin] = deal(fft_gammatonegram_inputs{tnum,:});
 40 | 
 41 |         % This is for mocking the output of the equivalent Python functions
 42 |         nfft = 2^(ceil(log(2*twin*fs)/log(2)));
 43 |         nwin = round(twin * fs);    
 44 |         nhop = round(thop * fs);
 45 |         
 46 |         % Mock out the FFT weights as well
 47 |         wts = fft2gammatonemx( ...
 48 |             nfft, ...
 49 |             fs, ...
 50 |             chs, ...
 51 |             1, ... % width is always 1 in the Python implementation
 52 |             fmin, ...
 53 |             fs/2, ...
 54 |             nfft/2+1 ...
 55 |         );
 56 | 
 57 |         % Mock out windowing function
 58 |         window = gtgram_window(nfft, nwin);
 59 | 
 60 |         res = gammatonegram( ...
 61 |             wave, ...
 62 |             fs, ...
 63 |             twin, ...
 64 |             thop, ...
 65 |             chs, ...
 66 |             fmin, ...
 67 |             fs/2, % fmax is always fs/2 in the Python version
 68 |             1     % Use FFT method
 69 |         );
 70 |         
 71 |         fft_gammatonegram_mocks(tnum, :) = { ...
 72 |             wts ...
 73 |         };
 74 |     
 75 |         fft_gammatonegram_results(tnum, :) = { ...
 76 |             res, ...
 77 |             window, ...
 78 |             nfft, ...
 79 |             nwin, ...
 80 |             nhop ...
 81 |         };
 82 |     
 83 |     end;
 84 |     
 85 |     results_file = fullfile('..', 'tests', 'data', 'test_fft_gammatonegram_data.mat');
 86 |     save(results_file, 'fft_gammatonegram_inputs', 'fft_gammatonegram_mocks', 'fft_gammatonegram_results');
 87 | end;
 88 | 
 89 | 
 90 | function win = gtgram_window(n, w)
 91 |     % Reproduction of Dan Ellis' windowing function built in to specgram.m
 92 |     halflen = w/2;
 93 |     halff = n/2;   % midpoint of win
 94 |     acthalflen = min(halff, halflen);
 95 | 
 96 |     halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
 97 |     win = zeros(1, n);
 98 |     win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
 99 |     win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
100 | end;


--------------------------------------------------------------------------------
/gammatone/test_generation/test_gammatonegram.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_gammatonegram()
 6 |     % Need:
 7 |     %  wave
 8 |     %  fs
 9 |     %  window_time
10 |     %  hop_time
11 |     %  channels
12 |     %  f_min
13 |     %  f_max
14 |     
15 |     % Need to mock out:
16 |     %  make_erb_filters output (elide)
17 |     %  centre_freqs (elide)
18 |     %  erb_filterbank (depends on X, SR, N, FMIN)
19 |     
20 |     % Ensure reproducible tests
21 |     rand('state', [3 1 4 1 5 9 2 7]);
22 |     
23 |     gammatonegram_inputs = {
24 |         'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ...
25 |         'sin220_01'  , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ...
26 |         'sin220_02'  , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ...
27 |         'rand_01'    , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ...
28 |         'rand_02'    , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ...
29 |         'rand_03'    , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ...
30 |     };
31 |     
32 |     % Mocked intermediate results for unit testing
33 |     gammatonegram_mocks = {};
34 |     
35 |     % Actual results
36 |     gammatonegram_results = {};
37 |     
38 |     for tnum=1:size(gammatonegram_inputs)(1)
39 |         [name, wave, fs, twin, thop, chs, fmin] = deal(gammatonegram_inputs{tnum,:});
40 |         res = gammatonegram( ...
41 |                   wave, ...
42 |                   fs, ...
43 |                   twin, ...
44 |                   thop, ...
45 |                   chs, ...
46 |                   fmin, ...
47 |                   0, % fmax is ignored
48 |                   0 % Don't use FFT method
49 |               );
50 |     
51 |         % This is for mocking the output of the equivalent Python functions
52 |         nwin     = round(twin * fs);    
53 |         hopsamps = round(thop * fs);
54 |         f_coefs  = flipud(MakeERBFilters(fs, chs, fmin));
55 |         x_f      = ERBFilterBank(wave, f_coefs);
56 |         x_e      = [x_f .^ 2];
57 |         x_e_cols = size(x_e, 2);
58 |         ncols    = 1 + floor((x_e_cols - nwin) / hopsamps);
59 |        
60 |         % Mock out the ERB filter functions too
61 |         fcoefs = flipud(MakeERBFilters(fs, chs, fmin));
62 |         erb_fb_output = ERBFilterBank(wave, fcoefs);
63 |     
64 |         gammatonegram_mocks(tnum, :) = { ...
65 |             erb_fb_output, ...
66 |             x_e_cols ...
67 |         };
68 |     
69 |         gammatonegram_results(tnum, :) = { ...
70 |             res, ...
71 |             nwin, ...
72 |             hopsamps, ...
73 |             ncols ...
74 |         };
75 |     
76 |     end;
77 |     
78 |     results_file = fullfile('..', 'tests', 'data', 'test_gammatonegram_data.mat');
79 |     save(results_file, 'gammatonegram_inputs', 'gammatonegram_mocks', 'gammatonegram_results');
80 | end;
81 | 


--------------------------------------------------------------------------------
/gammatone/test_generation/test_specgram.m:
--------------------------------------------------------------------------------
 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 2 | % 
 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 5 | function test_specgram()
 6 |     % Need:
 7 |     %  wave
 8 |     %  nfft
 9 |     %  fs
10 |     %  window_size
11 |     %  hop (technically the function takes the overlap, but only to recalculate this)
12 |     
13 |     % Ensure reproducible tests
14 |     rand('state', [3 1 4 1 5 9 2 7]);
15 |     
16 |     specgram_inputs = {
17 |         'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 2048, 22050, 551, 221; ...
18 |         'sin220_01'  , sin(2*pi*220*[0:4800 - 1]'/48000), 1024, 48000, 480, 480; ...
19 |         'sin220_02'  , sin(2*pi*220*[0:4800 - 1]'/48000), 4096, 48000, 1200, 480; ...
20 |         'rand_01'    , rand([1, 4410 - 1]), 2048, 44100, 882, 662; ...
21 |         'rand_02'    , rand([1, 9600 - 1]), 2048, 96000, 960, 480; ...
22 |         'rand_03'    , rand([1, 4800 - 1]), 1024, 48000, 480, 480; ...
23 |     };
24 |     
25 |     % Mocked intermediate results for unit testing
26 |     specgram_mocks = {};
27 |     
28 |     % Actual results
29 |     specgram_results = {};
30 |     
31 |     for tnum=1:size(specgram_inputs)(1)
32 |         [name, wave, nfft, fs, nwin, nhop] = deal(specgram_inputs{tnum,:});
33 | 
34 |         % Mock out windowing function
35 |         window = gtgram_window(nfft, nwin);
36 | 
37 |         res = specgram( ...
38 |             wave, ...
39 |             nfft, ...
40 |             fs, ...
41 |             nwin, ...
42 |             nwin - nhop ...
43 |         );
44 |         
45 |         specgram_mocks(tnum, :) = { ...
46 |             window, ...
47 |         };
48 |     
49 |         specgram_results(tnum, :) = { ...
50 |             res, ...
51 |         };
52 |     
53 |     end;
54 |     
55 |     results_file = fullfile('..', 'tests', 'data', 'test_specgram_data.mat');
56 |     save(results_file, 'specgram_inputs', 'specgram_mocks', 'specgram_results');
57 | end;
58 | 
59 | 
60 | function win = gtgram_window(n, w)
61 |     % Reproduction of Dan Ellis' windowing function built in to specgram.m
62 |     halflen = w/2;
63 |     halff = n/2;   % midpoint of win
64 |     acthalflen = min(halff, halflen);
65 | 
66 |     halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
67 |     win = zeros(1, n);
68 |     win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
69 |     win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
70 | end;


--------------------------------------------------------------------------------
/gammatone/tests/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | # 
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | 
6 | # Designate as module
7 | 


--------------------------------------------------------------------------------
/gammatone/tests/data/test_erb_filter_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erb_filter_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_erbspace_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erbspace_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_fft2gtmx_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft2gtmx_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_fft_gammatonegram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft_gammatonegram_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_filterbank_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_filterbank_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_gammatonegram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_gammatonegram_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/data/test_specgram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_specgram_data.mat


--------------------------------------------------------------------------------
/gammatone/tests/test_cfs.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | # 
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | import nose
 7 | from mock import patch
 8 | 
 9 | import gammatone.filters
10 | 
11 | EXPECTED_PARAMS = (
12 |     ((0, 0, 0), (0, 0, 0)),
13 |     ((22050, 100, 100), (100, 11025, 100)),
14 |     ((44100, 100, 100), (100, 22050, 100)),
15 |     ((44100, 100, 20), (20, 22050, 100)),
16 |     ((88200, 100, 20), (20, 44100, 100)),
17 |     ((22050, 100, 10), (10, 11025, 100)),
18 |     ((22050, 1000, 100), (100, 11025, 1000)),
19 |     ((160000, 500, 200), (200, 80000, 500)),
20 | )
21 | 
22 | 
23 | def test_centre_freqs():
24 |     for args, params in EXPECTED_PARAMS:
25 |         yield CentreFreqsTester(args, params)
26 | 
27 | 
28 | class CentreFreqsTester:
29 | 
30 |     def __init__(self, args, params):
31 |         self.args = args
32 |         self.params = params
33 |         self.description = "Centre freqs for {:g} {:d} {:g}".format(*args)
34 | 
35 | 
36 |     @patch('gammatone.filters.erb_space')
37 |     def __call__(self, erb_space_mock):
38 |         gammatone.filters.centre_freqs(*self.args)
39 |         erb_space_mock.assert_called_with(*self.params)
40 | 
41 | 
42 | if __name__ == '__main__':
43 |     nose.main()
44 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_erb_space.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | # 
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | import nose
 7 | import numpy as np
 8 | import scipy.io
 9 | from pkg_resources import resource_stream
10 | 
11 | import gammatone.filters
12 | 
13 | REF_DATA_FILENAME = 'data/test_erbspace_data.mat'
14 | 
15 | INPUT_KEY  = 'erbspace_inputs'
16 | RESULT_KEY = 'erbspace_results'
17 | 
18 | INPUT_COLS  = ('f_low', 'f_high', 'num_f')
19 | RESULT_COLS = ('cfs',)
20 | 
21 | 
22 | def load_reference_data():
23 |     """ Load test data generated from the reference code """
24 |     # Load test data
25 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
26 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
27 |     
28 |     zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
29 |     
30 |     for inputs, refs in zipped_data:
31 |         input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
32 |         ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
33 |         yield (input_dict, ref_dict)
34 |     
35 | 
36 | def test_ERB_space_known_values():
37 |     for inputs, refs in load_reference_data():
38 |         args = (
39 |             inputs['f_low'],
40 |             inputs['f_high'],
41 |             inputs['num_f'],
42 |         )
43 |         
44 |         expected = (refs['cfs'],)
45 |         
46 |         yield ERBSpaceTester(args, expected)
47 | 
48 | 
49 | class ERBSpaceTester:
50 |     
51 |     def __init__(self, args, expected):
52 |         self.args = args
53 |         self.expected = expected[0]
54 |         self.description = (
55 |             "ERB space for {:.1f} {:.1f} {:d}".format(
56 |                 float(self.args[0]),
57 |                 float(self.args[1]),
58 |                 int(self.args[2]),
59 |             )
60 |         )
61 |     
62 |     def __call__(self):
63 |         result = gammatone.filters.erb_space(*self.args)
64 |         assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-10)
65 | 
66 | if __name__ == '__main__':
67 |     nose.main()
68 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_fft_gtgram.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  3 | #
  4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
  5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  6 | from mock import patch
  7 | import nose
  8 | import numpy as np
  9 | import scipy.io
 10 | from pkg_resources import resource_stream
 11 | 
 12 | import gammatone.fftweight
 13 | 
 14 | REF_DATA_FILENAME = 'data/test_fft_gammatonegram_data.mat'
 15 | 
 16 | INPUT_KEY  = 'fft_gammatonegram_inputs'
 17 | MOCK_KEY   = 'fft_gammatonegram_mocks'
 18 | RESULT_KEY = 'fft_gammatonegram_results'
 19 | 
 20 | INPUT_COLS  = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin')
 21 | MOCK_COLS   = ('wts',)
 22 | RESULT_COLS = ('res', 'window', 'nfft', 'nwin', 'nhop')
 23 | 
 24 | 
 25 | def load_reference_data():
 26 |     """ Load test data generated from the reference code """
 27 |     # Load test data
 28 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
 29 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
 30 | 
 31 |     zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
 32 |     for inputs, mocks, refs in zipped_data:
 33 |         input_dict = dict(zip(INPUT_COLS, inputs))
 34 |         mock_dict  = dict(zip(MOCK_COLS, mocks))
 35 |         ref_dict = dict(zip(RESULT_COLS, refs))
 36 | 
 37 |         yield (input_dict, mock_dict, ref_dict)
 38 | 
 39 | 
 40 | def test_fft_specgram_window():
 41 |     for inputs, mocks, refs in load_reference_data():
 42 |         args = (
 43 |             refs['nfft'],
 44 |             refs['nwin'],
 45 |         )
 46 | 
 47 |         expected = (
 48 |             refs['window'],
 49 |         )
 50 | 
 51 |         yield FFTGtgramWindowTester(inputs['name'], args, expected)
 52 | 
 53 | class FFTGtgramWindowTester:
 54 | 
 55 |     def __init__(self, name, args, expected):
 56 |         self.nfft = args[0].squeeze()
 57 |         self.nwin = args[1].squeeze()
 58 |         self.expected = expected[0].squeeze()
 59 | 
 60 |         self.description = (
 61 |             "FFT gammatonegram window for nfft = {:f}, nwin = {:f}".format(
 62 |                 float(self.nfft), float(self.nwin)
 63 |             ))
 64 | 
 65 |     def __call__(self):
 66 |         result = gammatone.fftweight.specgram_window(self.nfft, self.nwin)
 67 |         max_diff = np.max(np.abs(result - self.expected))
 68 |         diagnostic = "Maximum difference: {:6e}".format(max_diff)
 69 |         assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
 70 | 
 71 | 
 72 | def test_fft_gtgram():
 73 |     for inputs, mocks, refs in load_reference_data():
 74 |         args = (
 75 |             inputs['fs'],
 76 |             inputs['twin'],
 77 |             inputs['thop'],
 78 |             inputs['channels'],
 79 |             inputs['fmin']
 80 |         )
 81 | 
 82 |         yield FFTGammatonegramTester(
 83 |             inputs['name'][0],
 84 |             args,
 85 |             inputs['wave'],
 86 |             mocks['wts'],
 87 |             refs['window'],
 88 |             refs['res']
 89 |         )
 90 | 
 91 | class FFTGammatonegramTester:
 92 |     """ Testing class for gammatonegram calculation """
 93 | 
 94 |     def __init__(self, name, args, sig, fft_weights, window, expected):
 95 |         self.signal = np.asarray(sig).squeeze()
 96 |         self.expected = np.asarray(expected).squeeze()
 97 |         self.fft_weights = np.asarray(fft_weights)
 98 |         self.args = args
 99 |         self.window = window.squeeze()
100 | 
101 |         self.description = "FFT gammatonegram for {:s}".format(name)
102 | 
103 |     def __call__(self):
104 |         # Note that the second return value from fft_weights isn't actually used
105 |         with patch(
106 |                 'gammatone.fftweight.fft_weights',
107 |                 return_value=(self.fft_weights, None)), \
108 |             patch(
109 |                 'gammatone.fftweight.specgram_window',
110 |                 return_value=self.window):
111 | 
112 |             result = gammatone.fftweight.fft_gtgram(self.signal, *self.args)
113 | 
114 |             max_diff = np.max(np.abs(result - self.expected))
115 |             diagnostic = "Maximum difference: {:6e}".format(max_diff)
116 | 
117 |             assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
118 | 
119 | if __name__ == '__main__':
120 |     nose.main()
121 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_fft_weights.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | # 
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | from __future__ import division
 7 | import nose
 8 | import numpy as np
 9 | import scipy.io
10 | from pkg_resources import resource_stream
11 | 
12 | import gammatone.fftweight
13 | 
14 | REF_DATA_FILENAME = 'data/test_fft2gtmx_data.mat'
15 | 
16 | INPUT_KEY  = 'fft2gtmx_inputs'
17 | RESULT_KEY = 'fft2gtmx_results'
18 | 
19 | INPUT_COLS  = ('nfft', 'sr', 'nfilts', 'width', 'fmin', 'fmax', 'maxlen')
20 | RESULT_COLS = ('weights', 'gain',)
21 | 
22 | def load_reference_data():
23 |     """ Load test data generated from the reference code """
24 |     # Load test data
25 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
26 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
27 |     
28 |     zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
29 |     
30 |     for inputs, refs in zipped_data:
31 |         input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
32 |         ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
33 |         yield (input_dict, ref_dict)
34 | 
35 | 
36 | def fft_weights_funcs(args, expected):
37 |     """
38 |     Construct a pair of unit tests for the gains and weights of the FFT to
39 |     gammatonegram calculation. Returns two functions: test_gains, test_weights.
40 |     """
41 |     args = list(args)
42 |     expected_weights = expected[0]
43 |     expected_gains = expected[1]
44 |     
45 |     # Convert nfft, nfilts, maxlen to ints
46 |     args[0] = int(args[0])
47 |     args[2] = int(args[2])
48 |     args[6] = int(args[6])
49 |     
50 |     weights, gains = gammatone.fftweight.fft_weights(*args)
51 |     
52 |     (test_weights_desc, test_gains_desc) = (
53 |         "FFT weights {:s} for nfft = {:d}, fs = {:d}, nfilts = {:d}".format(
54 |             label,
55 |             int(args[0]),
56 |             int(args[1]),
57 |             int(args[2]),
58 |     ) for label in ("weights", "gains"))
59 |     
60 |     def test_gains():
61 |         assert gains.shape == expected_gains.shape 
62 |         assert np.allclose(gains, expected_gains, rtol=1e-6, atol=1e-12)
63 |  
64 |     def test_weights():
65 |         assert weights.shape == expected_weights.shape
66 |         assert np.allclose(weights, expected_weights, rtol=1e-6, atol=1e-12)
67 |  
68 |     test_gains.description = test_gains_desc
69 |     test_weights.description = test_weights_desc
70 |     
71 |     return test_gains, test_weights
72 | 
73 | 
74 | def test_fft_weights():
75 |     for inputs, refs in load_reference_data():
76 |         args = tuple(inputs[col] for col in INPUT_COLS)        
77 |         expected = (refs['weights'], refs['gain'])
78 |         test_gains, test_weights = fft_weights_funcs(args, expected)
79 |         yield test_gains
80 |         yield test_weights
81 | 
82 | 
83 | if __name__ == '__main__':
84 |     nose.main()
85 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_filterbank.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | # 
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | import nose
 7 | import numpy as np
 8 | import scipy.io
 9 | from pkg_resources import resource_stream
10 | 
11 | import gammatone.filters
12 | 
13 | REF_DATA_FILENAME = 'data/test_filterbank_data.mat'
14 | 
15 | INPUT_KEY  = 'erb_filterbank_inputs'
16 | RESULT_KEY = 'erb_filterbank_results'
17 | 
18 | INPUT_COLS  = ('fcoefs', 'wave')
19 | RESULT_COLS = ('filterbank',)
20 | 
21 | def load_reference_data():
22 |     """ Load test data generated from the reference code """
23 |     # Load test data
24 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
25 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
26 |     
27 |     zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
28 |     
29 |     for inputs, refs in zipped_data:
30 |         input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
31 |         ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
32 |         yield (input_dict, ref_dict)
33 | 
34 | 
35 | def test_ERB_filterbank_known_values():
36 |     for inputs, refs in load_reference_data():
37 |         args = (
38 |             inputs['wave'],
39 |             inputs['fcoefs'],
40 |         )
41 |         
42 |         expected = (refs['filterbank'],)
43 |         
44 |         yield ERBFilterBankTester(args, expected)
45 | 
46 | 
47 | class ERBFilterBankTester:
48 |     
49 |     def __init__(self, args, expected):
50 |         self.signal = args[0]
51 |         self.fcoefs = args[1]
52 |         self.expected = expected[0]
53 |         
54 |         self.description = (
55 |             "Gammatone filterbank result for {:.1f} ... {:.1f}".format(
56 |                 self.fcoefs[0][0],
57 |                 self.fcoefs[0][1]
58 |         ))
59 |     
60 |     def __call__(self):
61 |         result = gammatone.filters.erb_filterbank(self.signal, self.fcoefs)
62 |         assert np.allclose(result, self.expected, rtol=1e-5, atol=1e-12)
63 | 
64 | 
65 | if __name__ == '__main__':
66 |     nose.main()
67 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_gammatone_filters.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | # 
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | import nose
 7 | import numpy as np
 8 | import scipy.io
 9 | from pkg_resources import resource_stream
10 | 
11 | import gammatone.filters
12 | 
13 | REF_DATA_FILENAME = 'data/test_erb_filter_data.mat'
14 | 
15 | INPUT_KEY  = 'erb_filter_inputs'
16 | RESULT_KEY = 'erb_filter_results'
17 | 
18 | INPUT_COLS  = ('fs', 'cfs')
19 | RESULT_COLS = ('fcoefs',)
20 | 
21 | def load_reference_data():
22 |     """ Load test data generated from the reference code """
23 |     # Load test data
24 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
25 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
26 |     
27 |     zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
28 |     
29 |     for inputs, refs in zipped_data:
30 |         input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
31 |         ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
32 |         yield (input_dict, ref_dict)
33 | 
34 | 
35 | def test_make_ERB_filters_known_values():
36 |     for inputs, refs in load_reference_data():
37 |         args = (
38 |             inputs['fs'],
39 |             inputs['cfs'],
40 |         )
41 |         
42 |         expected = (refs['fcoefs'],)
43 |         
44 |         yield MakeERBFiltersTester(args, expected)
45 | 
46 | 
47 | class MakeERBFiltersTester:
48 |     
49 |     def __init__(self, args, expected):
50 |         self.fs = args[0]
51 |         self.cfs = args[1]
52 |         self.expected = expected[0]
53 |         self.description = (
54 |             "Gammatone filters for {:f}, {:.1f} ... {:.1f}".format(
55 |                 float(self.fs),
56 |                 float(self.cfs[0]),
57 |                 float(self.cfs[-1])
58 |         ))
59 |     
60 |     def __call__(self):
61 |         result = gammatone.filters.make_erb_filters(self.fs, self.cfs)
62 |         assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12)
63 | 
64 | if __name__ == '__main__':
65 |     nose.main()
66 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_gammatonegram.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
  3 | #
  4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
  5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
  6 | from mock import patch
  7 | import nose
  8 | import numpy as np
  9 | import scipy.io
 10 | from pkg_resources import resource_stream
 11 | 
 12 | import gammatone.gtgram
 13 | 
 14 | REF_DATA_FILENAME = 'data/test_gammatonegram_data.mat'
 15 | 
 16 | INPUT_KEY  = 'gammatonegram_inputs'
 17 | MOCK_KEY   = 'gammatonegram_mocks'
 18 | RESULT_KEY = 'gammatonegram_results'
 19 | 
 20 | INPUT_COLS  = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin')
 21 | MOCK_COLS   = ('erb_fb', 'erb_fb_cols')
 22 | RESULT_COLS = ('gtgram', 'nwin', 'hopsamps', 'ncols')
 23 | 
 24 | 
 25 | def load_reference_data():
 26 |     """ Load test data generated from the reference code """
 27 |     # Load test data
 28 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
 29 |         data = scipy.io.loadmat(test_data, squeeze_me=True)
 30 | 
 31 |     zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
 32 |     for inputs, mocks, refs in zipped_data:
 33 |         input_dict = dict(zip(INPUT_COLS, inputs))
 34 |         mock_dict  = dict(zip(MOCK_COLS, mocks))
 35 |         ref_dict = dict(zip(RESULT_COLS, refs))
 36 |         yield (input_dict, mock_dict, ref_dict)
 37 | 
 38 | 
 39 | def test_nstrides():
 40 |     """ Test gamamtonegram stride calculations """
 41 |     for inputs, mocks, refs in load_reference_data():
 42 |         args = (
 43 |             inputs['fs'],
 44 |             inputs['twin'],
 45 |             inputs['thop'],
 46 |             mocks['erb_fb_cols']
 47 |         )
 48 | 
 49 |         expected = (
 50 |             refs['nwin'],
 51 |             refs['hopsamps'],
 52 |             refs['ncols']
 53 |         )
 54 | 
 55 |         yield GTGramStrideTester(inputs['name'], args, expected)
 56 | 
 57 | 
 58 | class GTGramStrideTester:
 59 |     """ Testing class for gammatonegram stride calculation """
 60 | 
 61 |     def __init__(self, name, inputs, expected):
 62 |         self.inputs      = inputs
 63 |         self.expected    = expected
 64 |         self.description = "Gammatonegram strides for {:s}".format(name)
 65 | 
 66 |     def __call__(self):
 67 |         results = gammatone.gtgram.gtgram_strides(*self.inputs)
 68 | 
 69 |         diagnostic = (
 70 |             "result: {:s}, expected: {:s}".format(
 71 |                 str(results),
 72 |                 str(self.expected)
 73 |             )
 74 |         )
 75 | 
 76 |         # These are integer values, so use direct equality
 77 |         assert results == self.expected
 78 | 
 79 | 
 80 | # TODO: possibly mock out gtgram_strides
 81 | 
 82 | def test_gtgram():
 83 |     for inputs, mocks, refs in load_reference_data():
 84 |         args = (
 85 |             inputs['fs'],
 86 |             inputs['twin'],
 87 |             inputs['thop'],
 88 |             inputs['channels'],
 89 |             inputs['fmin']
 90 |         )
 91 | 
 92 |         yield GammatonegramTester(
 93 |             inputs['name'],
 94 |             args,
 95 |             inputs['wave'],
 96 |             mocks['erb_fb'],
 97 |             refs['gtgram']
 98 |         )
 99 | 
100 | class GammatonegramTester:
101 |     """ Testing class for gammatonegram calculation """
102 | 
103 |     def __init__(self, name, args, sig, erb_fb_out, expected):
104 |         self.signal = np.asarray(sig)
105 |         self.expected = np.asarray(expected)
106 |         self.erb_fb_out = np.asarray(erb_fb_out)
107 |         self.args = args
108 | 
109 |         self.description = "Gammatonegram for {:s}".format(name)
110 | 
111 |     def __call__(self):
112 |         with patch(
113 |             'gammatone.gtgram.erb_filterbank',
114 |             return_value=self.erb_fb_out):
115 | 
116 |             result = gammatone.gtgram.gtgram(self.signal, *self.args)
117 | 
118 |             max_diff = np.max(np.abs(result - self.expected))
119 |             diagnostic = "Maximum difference: {:6e}".format(max_diff)
120 | 
121 |             assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
122 | 
123 | if __name__ == '__main__':
124 |     nose.main()
125 | 


--------------------------------------------------------------------------------
/gammatone/tests/test_specgram.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
 3 | #
 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
 6 | from mock import patch
 7 | import nose
 8 | import numpy as np
 9 | import scipy.io
10 | from pkg_resources import resource_stream
11 | 
12 | import gammatone.fftweight
13 | 
14 | REF_DATA_FILENAME = 'data/test_specgram_data.mat'
15 | 
16 | INPUT_KEY  = 'specgram_inputs'
17 | MOCK_KEY   = 'specgram_mocks'
18 | RESULT_KEY = 'specgram_results'
19 | 
20 | INPUT_COLS  = ('name', 'wave', 'nfft', 'fs', 'nwin', 'nhop')
21 | MOCK_COLS   = ('window',)
22 | RESULT_COLS = ('res',)
23 | 
24 | 
25 | def load_reference_data():
26 |     """ Load test data generated from the reference code """
27 |     # Load test data
28 |     with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
29 |         data = scipy.io.loadmat(test_data, squeeze_me=False)
30 | 
31 |     zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
32 |     for inputs, mocks, refs in zipped_data:
33 |         input_dict = dict(zip(INPUT_COLS, inputs))
34 |         mock_dict  = dict(zip(MOCK_COLS, mocks))
35 |         ref_dict = dict(zip(RESULT_COLS, refs))
36 | 
37 |         yield (input_dict, mock_dict, ref_dict)
38 | 
39 | 
40 | def test_specgram():
41 |     for inputs, mocks, refs in load_reference_data():
42 |         args = (
43 |             inputs['nfft'],
44 |             inputs['fs'],
45 |             inputs['nwin'],
46 |             inputs['nhop'],
47 |         )
48 | 
49 |         yield SpecgramTester(
50 |             inputs['name'][0],
51 |             args,
52 |             inputs['wave'],
53 |             mocks['window'],
54 |             refs['res']
55 |         )
56 | 
57 | class SpecgramTester:
58 |     """ Testing class for specgram replacement calculation """
59 | 
60 |     def __init__(self, name, args, sig, window, expected):
61 |         self.signal = np.asarray(sig).squeeze()
62 |         self.expected = np.asarray(expected).squeeze()
63 |         self.args = [int(a.squeeze()) for a in args]
64 |         self.window = window.squeeze()
65 |         self.description = "Specgram for {:s}".format(name)
66 | 
67 | 
68 |     def __call__(self):
69 |         with patch(
70 |                 'gammatone.fftweight.specgram_window',
71 |                 return_value=self.window):
72 |             result = gammatone.fftweight.specgram(self.signal, *self.args)
73 | 
74 |             max_diff = np.max(np.abs(result - self.expected))
75 |             diagnostic = "Maximum difference: {:6e}".format(max_diff)
76 | 
77 |             assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
78 | 
79 | if __name__ == '__main__':
80 |     nose.main()
81 | 


--------------------------------------------------------------------------------
/images/CRNN_SELDT_DCASE2020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/CRNN_SELDT_DCASE2020.png


--------------------------------------------------------------------------------
/images/SELDnet_output.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/SELDnet_output.jpg


--------------------------------------------------------------------------------
/images/scse_cropped.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/scse_cropped.pdf


--------------------------------------------------------------------------------
/images/seld-squeeze-structure.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld-squeeze-structure.pdf


--------------------------------------------------------------------------------
/images/seld_squeeze_structure_image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld_squeeze_structure_image.jpg


--------------------------------------------------------------------------------
/keras_model.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # The SELDnet architecture
  3 | #
  4 | 
  5 | from keras.layers import (Bidirectional, Conv2D, MaxPooling2D, Input, Concatenate,
  6 |                         Dense, Activation, Dropout, Reshape, Permute,
  7 |                         GlobalAveragePooling2D, add, Activation, Input, Flatten, Lambda, 
  8 |                         GlobalAveragePooling1D, Reshape, ELU, multiply)
  9 | #from keras.layers.core import Dense, Activation, Dropout, Reshape, Permute
 10 | from keras.layers.recurrent import GRU
 11 | from keras.layers.normalization import BatchNormalization
 12 | from keras.models import Model
 13 | from keras.layers.wrappers import TimeDistributed
 14 | from keras.optimizers import Adam
 15 | from keras.models import load_model
 16 | import keras
 17 | keras.backend.set_image_data_format('channels_first')
 18 | from IPython import embed
 19 | import numpy as np
 20 | 
 21 | import  keras.backend as K
 22 | import warnings # added
 23 | 
 24 | # From https://github.com/keras-team/keras-applications/blob/e52c477/keras_applications/imagenet_utils.py#L235-L331
 25 | def _obtain_input_shape(input_shape,
 26 |                         default_size,
 27 |                         min_size,
 28 |                         data_format,
 29 |                         require_flatten,
 30 |                         weights=None):
 31 |     """Internal utility to compute/validate a model's tensor shape.
 32 |     # Arguments
 33 |         input_shape: Either None (will return the default network input shape),
 34 |             or a user-provided shape to be validated.
 35 |         default_size: Default input width/height for the model.
 36 |         min_size: Minimum input width/height accepted by the model.
 37 |         data_format: Image data format to use.
 38 |         require_flatten: Whether the model is expected to
 39 |             be linked to a classifier via a Flatten layer.
 40 |         weights: One of `None` (random initialization)
 41 |             or 'imagenet' (pre-training on ImageNet).
 42 |             If weights='imagenet' input channels must be equal to 3.
 43 |     # Returns
 44 |         An integer shape tuple (may include None entries).
 45 |     # Raises
 46 |         ValueError: In case of invalid argument values.
 47 |     """
 48 |     if weights != 'imagenet' and input_shape and len(input_shape) == 3:
 49 |         if data_format == 'channels_first':
 50 |             if input_shape[0] not in {1, 3}:
 51 |                 warnings.warn(
 52 |                     'This model usually expects 1 or 3 input channels. '
 53 |                     'However, it was passed an input_shape with {input_shape}'
 54 |                     ' input channels.'.format(input_shape=input_shape[0]))
 55 |             default_shape = (input_shape[0], default_size, default_size)
 56 |         else:
 57 |             if input_shape[-1] not in {1, 3}:
 58 |                 warnings.warn(
 59 |                     'This model usually expects 1 or 3 input channels. '
 60 |                     'However, it was passed an input_shape with {n_input_channels}'
 61 |                     ' input channels.'.format(n_input_channels=input_shape[-1]))
 62 |             default_shape = (default_size, default_size, input_shape[-1])
 63 |     else:
 64 |         if data_format == 'channels_first':
 65 |             default_shape = (3, default_size, default_size)
 66 |         else:
 67 |             default_shape = (default_size, default_size, 3)
 68 |     if weights == 'imagenet' and require_flatten:
 69 |         if input_shape is not None:
 70 |             if input_shape != default_shape:
 71 |                 raise ValueError('When setting `include_top=True` '
 72 |                                  'and loading `imagenet` weights, '
 73 |                                  '`input_shape` should be {default_shape}.'.format(default_shape=default_shape))
 74 |         return default_shape
 75 |     if input_shape:
 76 |         if data_format == 'channels_first':
 77 |             if input_shape is not None:
 78 |                 if len(input_shape) != 3:
 79 |                     raise ValueError(
 80 |                         '`input_shape` must be a tuple of three integers.')
 81 |                 if input_shape[0] != 3 and weights == 'imagenet':
 82 |                     raise ValueError('The input must have 3 channels; got '
 83 |                                      '`input_shape={input_shape}`'.format(input_shape=input_shape))
 84 |                 if ((input_shape[1] is not None and input_shape[1] < min_size) or
 85 |                     (input_shape[2] is not None and input_shape[2] < min_size)):
 86 |                     raise ValueError('Input size must be at least {min_size}x{min_size};'
 87 |                                      ' got `input_shape={input_shape}`'.format(min_size=min_size,
 88 |                                                                                input_shape=input_shape))
 89 |         else:
 90 |             if input_shape is not None:
 91 |                 if len(input_shape) != 3:
 92 |                     raise ValueError(
 93 |                         '`input_shape` must be a tuple of three integers.')
 94 |                 if input_shape[-1] != 3 and weights == 'imagenet':
 95 |                     raise ValueError('The input must have 3 channels; got '
 96 |                                      '`input_shape={input_shape}`'.format(input_shape=input_shape))
 97 |                 if ((input_shape[0] is not None and input_shape[0] < min_size) or
 98 |                     (input_shape[1] is not None and input_shape[1] < min_size)):
 99 |                     raise ValueError('Input size must be at least {min_size}x{min_size};'
100 |                                      ' got `input_shape={input_shape}`'.format(min_size=min_size,
101 |                                                                                input_shape=input_shape))
102 |     else:
103 |         if require_flatten:
104 |             input_shape = default_shape
105 |         else:
106 |             if data_format == 'channels_first':
107 |                 input_shape = (3, None, None)
108 |             else:
109 |                 input_shape = (None, None, 3)
110 |     if require_flatten:
111 |         if None in input_shape:
112 |             raise ValueError('If `include_top` is True, '
113 |                              'you should specify a static `input_shape`. '
114 |                              'Got `input_shape={input_shape}`'.format(input_shape=input_shape))
115 |     return input_shape
116 | 
117 | 
118 | def squeeze_excite_block(input_tensor, ratio=16):
119 |     """ Create a channel-wise squeeze-excite block
120 |     Args:
121 |         input_tensor: input Keras tensor
122 |         ratio: number of output filters
123 |     Returns: a Keras tensor
124 |     References
125 |     -   [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507)
126 |     """
127 |     init = input_tensor
128 |     channel_axis = 1 if K.image_data_format() == "channels_first" else -1
129 |     filters = _tensor_shape(init)[channel_axis]
130 |     se_shape = (1, 1, filters)
131 | 
132 |     se = GlobalAveragePooling2D()(init)
133 |     se = Reshape(se_shape)(se)
134 |     se = Dense(filters // ratio, activation='relu', kernel_initializer='he_normal', use_bias=False)(se)
135 |     se = Dense(filters, activation='sigmoid', kernel_initializer='he_normal', use_bias=False)(se)
136 | 
137 |     if K.image_data_format() == 'channels_first':
138 |         se = Permute((3, 1, 2))(se)
139 | 
140 |     x = multiply([init, se])
141 |     return x
142 | 
143 | 
144 | def spatial_squeeze_excite_block(input_tensor):
145 |     """ Create a spatial squeeze-excite block
146 |     Args:
147 |         input_tensor: input Keras tensor
148 |     Returns: a Keras tensor
149 |     References
150 |     -   [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
151 |     """
152 | 
153 |     se = Conv2D(1, (1, 1), activation='sigmoid', use_bias=False,
154 |                 kernel_initializer='he_normal')(input_tensor)
155 | 
156 |     x = multiply([input_tensor, se])
157 |     return x
158 | 
159 | 
160 | def channel_spatial_squeeze_excite(input_tensor, ratio=16):
161 |     """ Create a spatial squeeze-excite block
162 |     Args:
163 |         input_tensor: input Keras tensor
164 |         ratio: number of output filters
165 |     Returns: a Keras tensor
166 |     References
167 |     -   [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507)
168 |     -   [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
169 |     """
170 | 
171 |     cse = squeeze_excite_block(input_tensor, ratio)
172 |     sse = spatial_squeeze_excite_block(input_tensor)
173 | 
174 |     x = add([cse, sse])
175 |     return x
176 | 
177 | def _tensor_shape(tensor):
178 |     return getattr(tensor, '_keras_shape')
179 | 
180 | def get_model(data_in, data_out, dropout_rate, nb_cnn2d_filt, f_pool_size, t_pool_size,
181 |               rnn_size, fnn_size, weights, doa_objective, baseline, ratio):
182 |     # model definition
183 |     spec_start = Input(shape=(data_in[-3], data_in[-2], data_in[-1]))
184 | 
185 |     # CNN
186 |     spec_cnn = spec_start
187 |     for i, convCnt in enumerate(f_pool_size):
188 |         
189 |         if baseline is False:
190 | 
191 |             spec_aux = spec_cnn
192 |             spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn)
193 |             spec_cnn = BatchNormalization()(spec_cnn)
194 |             spec_cnn = ELU()(spec_cnn) 
195 |             spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn)
196 |             spec_cnn = BatchNormalization()(spec_cnn) 
197 |         
198 |             spec_aux = Conv2D(nb_cnn2d_filt, 1, padding='same')(spec_aux)
199 |             spec_aux = BatchNormalization()(spec_aux)
200 | 
201 |             spec_cnn = add([spec_cnn,spec_aux])
202 |             spec_cnn = ELU()(spec_cnn)
203 |             
204 |             if ratio != 0:
205 | 
206 |                 spec_cnn = channel_spatial_squeeze_excite(spec_cnn,ratio=ratio)
207 | 
208 |                 spec_cnn = add([spec_cnn, spec_aux])
209 | 
210 |         else:
211 | 
212 |             spec_cnn = Conv2D(filters=nb_cnn2d_filt, kernel_size=(3, 3), padding='same')(spec_cnn)
213 |             spec_cnn = BatchNormalization()(spec_cnn)
214 |             spec_cnn = Activation('relu')(spec_cnn)
215 |         spec_cnn = MaxPooling2D(pool_size=(t_pool_size[i], f_pool_size[i]))(spec_cnn)
216 |         spec_cnn = Dropout(dropout_rate)(spec_cnn)
217 |     spec_cnn = Permute((2, 1, 3))(spec_cnn)
218 | 
219 |     # RNN
220 |     spec_rnn = Reshape((data_out[0][-2], -1))(spec_cnn)
221 |     for nb_rnn_filt in rnn_size:
222 |         spec_rnn = Bidirectional(
223 |             GRU(nb_rnn_filt, activation='tanh', dropout=dropout_rate, recurrent_dropout=dropout_rate,
224 |                 return_sequences=True),
225 |             merge_mode='mul'
226 |         )(spec_rnn)
227 | 
228 |     # FC - DOA
229 |     doa = spec_rnn
230 |     for nb_fnn_filt in fnn_size:
231 |         doa = TimeDistributed(Dense(nb_fnn_filt))(doa)
232 |         doa = Dropout(dropout_rate)(doa)
233 | 
234 |     doa = TimeDistributed(Dense(data_out[1][-1]))(doa)
235 |     doa = Activation('tanh', name='doa_out')(doa)
236 | 
237 |     # FC - SED
238 |     sed = spec_rnn
239 |     for nb_fnn_filt in fnn_size:
240 |         sed = TimeDistributed(Dense(nb_fnn_filt))(sed)
241 |         sed = Dropout(dropout_rate)(sed)
242 |     sed = TimeDistributed(Dense(data_out[0][-1]))(sed)
243 |     sed = Activation('sigmoid', name='sed_out')(sed)
244 | 
245 |     model = None
246 |     if doa_objective is 'mse':
247 |         model = Model(inputs=spec_start, outputs=[sed, doa])
248 |         model.compile(optimizer=Adam(), loss=['binary_crossentropy', 'mse'], loss_weights=weights)
249 |     elif doa_objective is 'masked_mse':
250 |         doa_concat = Concatenate(axis=-1, name='doa_concat')([sed, doa])
251 |         model = Model(inputs=spec_start, outputs=[sed, doa_concat])
252 |         model.compile(optimizer=Adam(), loss=['binary_crossentropy', masked_mse], loss_weights=weights)
253 |     else:
254 |         print('ERROR: Unknown doa_objective: {}'.format(doa_objective))
255 |         exit()
256 |     model.summary()
257 |     return model
258 | 
259 | 
260 | def masked_mse(y_gt, model_out):
261 |     # SED mask: Use only the predicted DOAs when gt SED > 0.5
262 |     sed_out = y_gt[:, :, :14] >= 0.5 #TODO fix this hardcoded value of number of classes
263 |     sed_out = keras.backend.repeat_elements(sed_out, 3, -1)
264 |     sed_out = keras.backend.cast(sed_out, 'float32')
265 | 
266 |     # Use the mask to computed mse now. Normalize with the mask weights #TODO fix this hardcoded value of number of classes
267 |     return keras.backend.sqrt(keras.backend.sum(keras.backend.square(y_gt[:, :, 14:] - model_out[:, :, 14:]) * sed_out))/keras.backend.sum(sed_out)
268 | 
269 | 
270 | def load_seld_model(model_file, doa_objective):
271 |     if doa_objective is 'mse':
272 |         return load_model(model_file)
273 |     elif doa_objective is 'masked_mse':
274 |         return load_model(model_file, custom_objects={'masked_mse': masked_mse})
275 |     else:
276 |         print('ERROR: Unknown doa objective: {}'.format(doa_objective))
277 |         exit()
278 | 
279 | 
280 | 
281 | 


--------------------------------------------------------------------------------
/metrics/LICENSE.md:
--------------------------------------------------------------------------------
 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
 2 | Copyright (c) 2020 Tampere University and its licensors
 3 | 
 4 | Permission is hereby granted, free of charge, to any person obtaining a copy
 5 | of this script, SELD_evaluation_metrics.py (the "Software"), to deal
 6 | in the Software without restriction, including without limitation the rights
 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 8 | copies of the Software, and to permit persons to whom the Software is
 9 | furnished to do so, subject to the following conditions:
10 | 
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 | 
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 | 
22 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
23 | 


--------------------------------------------------------------------------------
/metrics/SELD_evaluation_metrics.py:
--------------------------------------------------------------------------------
  1 | #
  2 | # Implements the localization and detection metrics proposed in the paper
  3 | #
  4 | # Joint Measurement of Localization and Detection of Sound Events
  5 | # Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, Tuomas Virtanen
  6 | # WASPAA 2019
  7 | #
  8 | #
  9 | # This script has MIT license
 10 | #
 11 | 
 12 | import numpy as np
 13 | from IPython import  embed
 14 | eps = np.finfo(np.float).eps
 15 | from scipy.optimize import linear_sum_assignment
 16 | 
 17 | 
 18 | class SELDMetrics(object):
 19 |     def __init__(self, doa_threshold=20, nb_classes=11):
 20 |         '''
 21 |             This class implements both the class-sensitive localization and location-sensitive detection metrics.
 22 |             Additionally, based on the user input, the corresponding averaging is performed within the segment.
 23 | 
 24 |         :param nb_classes: Number of sound classes. In the paper, nb_classes = 11
 25 |         :param doa_thresh: DOA threshold for location sensitive detection.
 26 |         '''
 27 | 
 28 |         self._TP = 0
 29 |         self._FP = 0
 30 |         self._TN = 0
 31 |         self._FN = 0
 32 | 
 33 |         self._S = 0
 34 |         self._D = 0
 35 |         self._I = 0
 36 | 
 37 |         self._Nref = 0
 38 |         self._Nsys = 0
 39 | 
 40 |         self._total_DE = 0
 41 |         self._DE_TP = 0
 42 | 
 43 |         self._spatial_T = doa_threshold
 44 |         self._nb_classes = nb_classes
 45 | 
 46 |     def compute_seld_scores(self):
 47 |         '''
 48 |         Collect the final SELD scores
 49 | 
 50 |         :return: returns both location-sensitive detection scores and class-sensitive localization scores
 51 |         '''
 52 | 
 53 |         # Location-senstive detection performance
 54 |         ER = (self._S + self._D + self._I) / float(self._Nref + eps)
 55 | 
 56 |         prec = float(self._TP) / float(self._Nsys + eps)
 57 |         recall = float(self._TP) / float(self._Nref + eps)
 58 |         F = 2 * prec * recall / (prec + recall + eps)
 59 | 
 60 |         # Class-sensitive localization performance
 61 |         if self._DE_TP:
 62 |             DE = self._total_DE / float(self._DE_TP + eps)
 63 |         else:
 64 |             # When the total number of prediction is zero
 65 |             DE = 180
 66 | 
 67 |         DE_prec = float(self._DE_TP) / float(self._Nsys + eps)
 68 |         DE_recall = float(self._DE_TP) / float(self._Nref + eps)
 69 |         DE_F = 2 * DE_prec * DE_recall / (DE_prec + DE_recall + eps)
 70 | 
 71 |         return ER, F, DE, DE_F
 72 | 
 73 |     def update_seld_scores_xyz(self, pred, gt):
 74 |         '''
 75 |         Implements the spatial error averaging according to equation [5] in the paper, using Cartesian distance
 76 | 
 77 |         :param pred: dictionary containing class-wise prediction results for each N-seconds segment block
 78 |         :param gt: dictionary containing class-wise groundtruth for each N-seconds segment block
 79 |         '''
 80 |         for block_cnt in range(len(gt.keys())):
 81 |             # print('\nblock_cnt', block_cnt, end='')
 82 |             loc_FN, loc_FP = 0, 0
 83 |             for class_cnt in range(self._nb_classes):
 84 |                 # print('\tclass:', class_cnt, end='')
 85 |                 # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment
 86 |                 if class_cnt in gt[block_cnt]:
 87 |                     self._Nref += 1
 88 |                 if class_cnt in pred[block_cnt]:
 89 |                     self._Nsys += 1
 90 | 
 91 |                 if class_cnt in gt[block_cnt] and class_cnt in pred[block_cnt]:
 92 |                     # True positives or False negative case
 93 | 
 94 |                     # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then
 95 |                     # calculate the spatial distance using the following code. In the current code, if there are multiple 
 96 |                     # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it.
 97 | 
 98 |                     total_spatial_dist = 0
 99 |                     total_framewise_matching_doa = 0
100 |                     gt_ind_list = gt[block_cnt][class_cnt][0][0]
101 |                     pred_ind_list = pred[block_cnt][class_cnt][0][0]
102 |                     for gt_ind, gt_val in enumerate(gt_ind_list):
103 |                         if gt_val in pred_ind_list:
104 |                             total_framewise_matching_doa += 1
105 |                             pred_ind = pred_ind_list.index(gt_val)
106 | 
107 |                             gt_arr = np.array(gt[block_cnt][class_cnt][0][1][gt_ind])
108 |                             pred_arr = np.array(pred[block_cnt][class_cnt][0][1][pred_ind])
109 | 
110 |                             if gt_arr.shape[0]==1 and pred_arr.shape[0]==1:
111 |                                 total_spatial_dist += distance_between_cartesian_coordinates(gt_arr[0][0], gt_arr[0][1], gt_arr[0][2], pred_arr[0][0], pred_arr[0][1], pred_arr[0][2])
112 |                             else:
113 |                                 total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr)
114 | 
115 |                     if total_spatial_dist == 0 and total_framewise_matching_doa == 0:
116 |                         loc_FN += 1
117 |                         self._FN += 1
118 |                     else:
119 |                         avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa)
120 | 
121 |                         self._total_DE += avg_spatial_dist
122 |                         self._DE_TP += 1
123 | 
124 |                         if avg_spatial_dist <= self._spatial_T:
125 |                             self._TP += 1
126 |                         else:
127 |                             loc_FN += 1
128 |                             self._FN += 1
129 |                 elif class_cnt in gt[block_cnt] and class_cnt not in pred[block_cnt]:
130 |                     # False negative
131 |                     loc_FN += 1
132 |                     self._FN += 1
133 |                 elif class_cnt not in gt[block_cnt] and class_cnt in pred[block_cnt]:
134 |                     # False positive
135 |                     loc_FP += 1
136 |                     self._FP += 1
137 |                 elif class_cnt not in gt[block_cnt] and class_cnt not in pred[block_cnt]:
138 |                     # True negative
139 |                     self._TN += 1
140 | 
141 |             self._S += np.minimum(loc_FP, loc_FN)
142 |             self._D += np.maximum(0, loc_FN - loc_FP)
143 |             self._I += np.maximum(0, loc_FP - loc_FN)
144 |         return
145 | 
146 |     def update_seld_scores(self, pred_deg, gt_deg):
147 |         '''
148 |         Implements the spatial error averaging according to equation [5] in the paper, using Polar distance
149 |         Expects the angles in degrees
150 | 
151 |         :param pred_deg: dictionary containing class-wise prediction results for each N-seconds segment block
152 |         :param gt_deg: dictionary containing class-wise groundtruth for each N-seconds segment block
153 |         '''
154 |         for block_cnt in range(len(gt_deg.keys())):
155 |             # print('\nblock_cnt', block_cnt, end='')
156 |             loc_FN, loc_FP = 0, 0
157 |             for class_cnt in range(self._nb_classes):
158 |                 # print('\tclass:', class_cnt, end='')
159 |                 # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment
160 |                 if class_cnt in gt_deg[block_cnt]:
161 |                     self._Nref += 1
162 |                 if class_cnt in pred_deg[block_cnt]:
163 |                     self._Nsys += 1
164 | 
165 |                 if class_cnt in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]:
166 |                     # True positives or False negative case
167 | 
168 |                     # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then
169 |                     # calculate the spatial distance using the following code. In the current code, if there are multiple 
170 |                     # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it.
171 |                     total_spatial_dist = 0
172 |                     total_framewise_matching_doa = 0
173 |                     gt_ind_list = gt_deg[block_cnt][class_cnt][0][0]
174 |                     pred_ind_list = pred_deg[block_cnt][class_cnt][0][0]
175 |                     for gt_ind, gt_val in enumerate(gt_ind_list):
176 |                         if gt_val in pred_ind_list:
177 |                             total_framewise_matching_doa += 1
178 |                             pred_ind = pred_ind_list.index(gt_val)
179 | 
180 |                             gt_arr = np.array(gt_deg[block_cnt][class_cnt][0][1][gt_ind]) * np.pi / 180
181 |                             pred_arr = np.array(pred_deg[block_cnt][class_cnt][0][1][pred_ind]) * np.pi / 180
182 |                             if gt_arr.shape[0]==1 and pred_arr.shape[0]==1:
183 |                                 total_spatial_dist += distance_between_spherical_coordinates_rad(gt_arr[0][0], gt_arr[0][1], pred_arr[0][0], pred_arr[0][1])
184 |                             else:
185 |                                 total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr)
186 | 
187 |                     if total_spatial_dist == 0 and total_framewise_matching_doa == 0:
188 |                         loc_FN += 1
189 |                         self._FN += 1
190 |                     else:
191 |                         avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa)
192 | 
193 |                         self._total_DE += avg_spatial_dist
194 |                         self._DE_TP += 1
195 | 
196 |                         if avg_spatial_dist <= self._spatial_T:
197 |                             self._TP += 1
198 |                         else:
199 |                             loc_FN += 1
200 |                             self._FN += 1
201 |                 elif class_cnt in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]:
202 |                     # False negative
203 |                     loc_FN += 1
204 |                     self._FN += 1
205 |                 elif class_cnt not in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]:
206 |                     # False positive
207 |                     loc_FP += 1
208 |                     self._FP += 1
209 |                 elif class_cnt not in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]:
210 |                     # True negative
211 |                     self._TN += 1
212 | 
213 |             self._S += np.minimum(loc_FP, loc_FN)
214 |             self._D += np.maximum(0, loc_FN - loc_FP)
215 |             self._I += np.maximum(0, loc_FP - loc_FN)
216 |         return
217 | 
218 | 
219 | def distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2):
220 |     """
221 |     Angular distance between two spherical coordinates
222 |     MORE: https://en.wikipedia.org/wiki/Great-circle_distance
223 | 
224 |     :return: angular distance in degrees
225 |     """
226 |     dist = np.sin(ele1) * np.sin(ele2) + np.cos(ele1) * np.cos(ele2) * np.cos(np.abs(az1 - az2))
227 |     # Making sure the dist values are in -1 to 1 range, else np.arccos kills the job
228 |     dist = np.clip(dist, -1, 1)
229 |     dist = np.arccos(dist) * 180 / np.pi
230 |     return dist
231 | 
232 | 
233 | def distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2):
234 |     """
235 |     Angular distance between two cartesian coordinates
236 |     MORE: https://en.wikipedia.org/wiki/Great-circle_distance
237 |     Check 'From chord length' section
238 | 
239 |     :return: angular distance in degrees
240 |     """
241 |     # Normalize the Cartesian vectors
242 |     N1 = np.sqrt(x1**2 + y1**2 + z1**2 + 1e-10)
243 |     N2 = np.sqrt(x2**2 + y2**2 + z2**2 + 1e-10)
244 |     x1, y1, z1, x2, y2, z2 = x1/N1, y1/N1, z1/N1, x2/N2, y2/N2, z2/N2
245 | 
246 |     #Compute the distance
247 |     dist = x1*x2 + y1*y2 + z1*z2
248 |     dist = np.clip(dist, -1, 1)
249 |     dist = np.arccos(dist) * 180 / np.pi
250 |     return dist
251 | 
252 | 
253 | def least_distance_between_gt_pred(gt_list, pred_list):
254 |     """
255 |         Shortest distance between two sets of DOA coordinates. Given a set of groundtruth coordinates,
256 |         and its respective predicted coordinates, we calculate the distance between each of the 
257 |         coordinate pairs resulting in a matrix of distances, where one axis represents the number of groundtruth
258 |         coordinates and the other the predicted coordinates. The number of estimated peaks need not be the same as in
259 |         groundtruth, thus the distance matrix is not always a square matrix. We use the hungarian algorithm to find the
260 |         least cost in this distance matrix.
261 |         :param gt_list_xyz: list of ground-truth Cartesian or Polar coordinates in Radians
262 |         :param pred_list_xyz: list of predicted Carteisan or Polar coordinates in Radians
263 |         :return: cost -  distance
264 |         :return: less - number of DOA's missed
265 |         :return: extra - number of DOA's over-estimated
266 |     """
267 |     gt_len, pred_len = gt_list.shape[0], pred_list.shape[0]
268 |     ind_pairs = np.array([[x, y] for y in range(pred_len) for x in range(gt_len)])
269 |     cost_mat = np.zeros((gt_len, pred_len))
270 | 
271 |     if gt_len and pred_len:
272 |         if len(gt_list[0]) == 3: #Cartesian
273 |             x1, y1, z1, x2, y2, z2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], gt_list[ind_pairs[:, 0], 2], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1], pred_list[ind_pairs[:, 1], 2]
274 |             cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2)
275 |         else:
276 |             az1, ele1, az2, ele2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1]
277 |             cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2)
278 | 
279 |     row_ind, col_ind = linear_sum_assignment(cost_mat)
280 |     cost = cost_mat[row_ind, col_ind].sum()
281 |     return cost
282 | 
283 | 
284 | def early_stopping_metric(sed_error, doa_error):
285 |     """
286 |     Compute early stopping metric from sed and doa errors.
287 | 
288 |     :param sed_error: [error rate (0 to 1 range), f score (0 to 1 range)]
289 |     :param doa_error: [doa error (in degrees), frame recall (0 to 1 range)]
290 |     :return: early stopping metric result
291 |     """
292 |     seld_metric = np.mean([
293 |         sed_error[0],
294 |         1 - sed_error[1],
295 |         doa_error[0]/180,
296 |         1 - doa_error[1]]
297 |         )
298 |     return seld_metric
299 | 


--------------------------------------------------------------------------------
/parameter.py:
--------------------------------------------------------------------------------
  1 | # Parameters used in the feature extraction, neural network model, and training the SELDnet can be changed here.
  2 | #
  3 | # Ideally, do not change the values of the default parameters. Create separate cases with unique <task-id> as seen in
  4 | # the code below (if-else loop) and use them. This way you can easily reproduce a configuration on a later time.
  5 | 
  6 | 
  7 | def get_params(argv='1'):
  8 |     print("SET: {}".format(argv))
  9 |     # ########### default parameters ##############
 10 |     params = dict(
 11 |         quick_test=False,     # To do quick test. Trains/test on small subset of dataset, and # of epochs
 12 | 
 13 |         # INPUT PATH
 14 |         dataset_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\base_folder',  # Base folder containing the foa/mic and metadata folders
 15 |         #dataset_dir='/content/gdrive/My Drive/DCASE2020-Task3/base_folder',
 16 | 
 17 |         # OUTPUT PATH
 18 |         feat_label_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\input_feature\\baseline_log_mel',  # Directory to dump extracted features and labels
 19 |         #feat_label_dir='/content/gdrive/My Drive/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel',
 20 |         model_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\models',   # Dumps the trained models and training curves in this folder
 21 |         dcase_output=True,     # If true, dumps the results recording-wise in 'dcase_dir' path.
 22 |                                # Set this true after you have finalized your model, save the output, and submit
 23 |         dcase_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\results',  # Dumps the recording-wise network output in this folder
 24 | 
 25 |         # DATASET LOADING PARAMETERS
 26 |         mode='eval',         # 'dev' - development or 'eval' - evaluation dataset
 27 |         dataset='mic',       # 'foa' - ambisonic or 'mic' - microphone signals
 28 | 
 29 |         #FEATURE PARAMS
 30 |         fs=24000,
 31 |         hop_len_s=0.02,
 32 |         label_hop_len_s=0.1,
 33 |         max_audio_len_s=60,
 34 |         nb_mel_bins=64,
 35 | 
 36 |         #AUDIO REPRESENTATION TYPE (+)
 37 |         is_gammatone=False, # if set to True, extracts gammatone representation instead of Log-Mel
 38 |         fmin=.0,
 39 | 
 40 |         # DNN MODEL PARAMETERS
 41 |         label_sequence_length=60,        # Feature sequence length
 42 |         batch_size=64,              # Batch size
 43 |         dropout_rate=0,             # Dropout rate, constant for all layers
 44 |         nb_cnn2d_filt=64,           # Number of CNN nodes, constant for each layer
 45 |         f_pool_size=[4, 4, 2],      # CNN frequency pooling, length of list = number of CNN layers, list value = pooling per layer
 46 | 
 47 |         # CNN squeeze-excitation parameter (+)
 48 |         do_baseline=False,
 49 |         ratio=16,
 50 | 
 51 |         # Get dataset
 52 |         folder='normalized',
 53 | 
 54 |         rnn_size=[128, 128],        # RNN contents, length of list = number of layers, list value = number of nodes
 55 |         fnn_size=[128],             # FNN contents, length of list = number of layers, list value = number of nodes
 56 |         loss_weights=[1., 1000.],     # [sed, doa] weight for scaling the DNN outputs
 57 |         nb_epochs=50,               # Train for maximum epochs
 58 |         epochs_per_fit=5,           # Number of epochs per fit
 59 |         doa_objective='masked_mse',     # supports: mse, masked_mse. mse- original seld approach; masked_mse - dcase 2020 approach
 60 |         
 61 |         #METRIC PARAMETERS
 62 |         lad_doa_thresh=20
 63 |        
 64 |     )
 65 |     feature_label_resolution = int(params['label_hop_len_s'] // params['hop_len_s'])
 66 |     params['feature_sequence_length'] = params['label_sequence_length'] * feature_label_resolution
 67 |     params['t_pool_size'] = [feature_label_resolution, 1, 1]     # CNN time pooling
 68 |     params['patience'] = int(params['nb_epochs'])     # Stop training if patience is reached
 69 | 
 70 |     params['unique_classes'] = {
 71 |             'alarm': 0,
 72 |             'baby': 1,
 73 |             'crash': 2,
 74 |             'dog': 3,
 75 |             'engine': 4,
 76 |             'female_scream': 5,
 77 |             'female_speech': 6,
 78 |             'fire': 7,
 79 |             'footsteps': 8,
 80 |             'knock': 9,
 81 |             'male_scream': 10,
 82 |             'male_speech': 11,
 83 |             'phone': 12,
 84 |             'piano': 13
 85 |         }
 86 | 
 87 | 
 88 |     # ########### User defined parameters ##############
 89 |     # if argv == '1':
 90 |     #     print("USING DEFAULT PARAMETERS\n")
 91 | 
 92 |     # elif argv == '2':
 93 |     #     params['mode'] = 'dev'
 94 |     #     params['dataset'] = 'mic'
 95 | 
 96 |     # elif argv == '3':
 97 |     #     params['mode'] = 'eval'
 98 |     #     params['dataset'] = 'mic'
 99 | 
100 |     # elif argv == '4':
101 |     #     params['mode'] = 'dev'
102 |     #     params['dataset'] = 'foa'
103 | 
104 |     # elif argv == '5':
105 |     #     params['mode'] = 'eval'
106 |     #     params['dataset'] = 'foa'
107 | 
108 |     # elif argv == '999':
109 |     #     print("QUICK TEST MODE\n")
110 |     #     params['quick_test'] = True
111 |     #     params['epochs_per_fit'] = 1
112 | 
113 |     # else:
114 |     #     print('ERROR: unknown argument {}'.format(argv))
115 |     #     exit()
116 | 
117 |     for key, value in params.items():
118 |         print("\t{}: {}".format(key, value))
119 |     return params
120 | 


--------------------------------------------------------------------------------
/visualize_SELD_output.py:
--------------------------------------------------------------------------------
 1 | # Script for visualising the SELD output.
 2 | #
 3 | # NOTE: Make sure to use the appropriate backend for the matplotlib based on your OS
 4 | 
 5 | import os
 6 | import numpy as np
 7 | import librosa.display
 8 | import cls_feature_class
 9 | import parameter
10 | import matplotlib.gridspec as gridspec
11 | import matplotlib.pyplot as plot
12 | plot.switch_backend('agg')
13 | plot.rcParams.update({'font.size': 22})
14 | 
15 | 
16 | def collect_classwise_data(_in_dict):
17 |     _out_dict = {}
18 |     for _key in _in_dict.keys():
19 |         for _seld in _in_dict[_key]:
20 |             if _seld[0] not in _out_dict:
21 |                 _out_dict[_seld[0]] = []
22 |             _out_dict[_seld[0]].append([_key, _seld[0], _seld[1], _seld[2]])
23 |     return _out_dict
24 | 
25 | 
26 | def plot_func(plot_data, hop_len_s, ind, plot_x_ax=False, plot_y_ax=False):
27 |     cmap = ['b', 'r', 'g', 'y', 'k', 'c', 'm', 'b', 'r', 'g', 'y', 'k', 'c', 'm']
28 |     for class_ind in plot_data.keys():
29 |         time_ax = np.array(plot_data[class_ind])[:, 0] *hop_len_s
30 |         y_ax = np.array(plot_data[class_ind])[:, ind]
31 |         plot.plot(time_ax, y_ax, marker='.', color=cmap[class_ind], linestyle='None', markersize=4)
32 |     plot.grid()
33 |     plot.xlim([0, 60])
34 |     if not plot_x_ax:
35 |         plot.gca().axes.set_xticklabels([])
36 | 
37 |     if not plot_y_ax:
38 |         plot.gca().axes.set_yticklabels([])
39 | # --------------------------------- MAIN SCRIPT STARTS HERE -----------------------------------------
40 | params = parameter.get_params()
41 | 
42 | # output format file to visualize
43 | pred = os.path.join(params['dcase_dir'], '2_mic_dev/fold1_room1_mix006_ov1.csv')
44 | 
45 | # path of reference audio directory for visualizing the spectrogram and description directory for
46 | # visualizing the reference
47 | # Note: The code finds out the audio filename from the predicted filename automatically
48 | ref_dir = os.path.join(params['dataset_dir'], 'metadata_dev')
49 | aud_dir = os.path.join(params['dataset_dir'], 'mic_dev')
50 | 
51 | # load the predicted output format
52 | feat_cls = cls_feature_class.FeatureClass(params)
53 | pred_dict = feat_cls.load_output_format_file(pred)
54 | pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict)
55 | 
56 | # load the reference output format
57 | ref_filename = os.path.basename(pred)
58 | ref_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_dir, ref_filename))
59 | 
60 | pred_data = collect_classwise_data(pred_dict_polar)
61 | ref_data = collect_classwise_data(ref_dict_polar)
62 | 
63 | nb_classes = len(feat_cls.get_classes())
64 | 
65 | # load the audio and extract spectrogram
66 | ref_filename = os.path.basename(pred).replace('.csv', '.wav')
67 | audio, fs = feat_cls._load_audio(os.path.join(aud_dir, ref_filename))
68 | stft = np.abs(np.squeeze(feat_cls._spectrogram(audio[:, :1])))
69 | stft = librosa.amplitude_to_db(stft, ref=np.max)
70 | 
71 | plot.figure(figsize=(20, 15))
72 | gs = gridspec.GridSpec(4, 4)
73 | ax0 = plot.subplot(gs[0, 1:3]), librosa.display.specshow(stft.T, sr=fs, x_axis='s', y_axis='linear'), plot.xlim([0, 60]), plot.xticks([]), plot.xlabel(''), plot.title('Spectrogram')
74 | ax1 = plot.subplot(gs[1, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=1, plot_y_ax=True), plot.ylim([-1, nb_classes + 1]), plot.title('SED reference')
75 | ax2 = plot.subplot(gs[1, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=1), plot.ylim([-1, nb_classes + 1]), plot.title('SED predicted')
76 | ax3 = plot.subplot(gs[2, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=2, plot_y_ax=True), plot.ylim([-180, 180]), plot.title('Azimuth reference')
77 | ax4 = plot.subplot(gs[2, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=2), plot.ylim([-180, 180]), plot.title('Azimuth predicted')
78 | ax5 = plot.subplot(gs[3, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=3, plot_y_ax=True), plot.ylim([-90, 90]), plot.title('Elevation reference')
79 | ax6 = plot.subplot(gs[3, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=3), plot.ylim([-90, 90]), plot.title('Elevation predicted')
80 | ax_lst = [ax0, ax1, ax2, ax3, ax4, ax5, ax6]
81 | plot.savefig(os.path.join(params['dcase_dir'] , ref_filename.replace('.wav', '.jpg')), dpi=300, bbox_inches = "tight")
82 | 
83 | 
84 | 


--------------------------------------------------------------------------------