├── .gitignore
├── LICENSE.md
├── README.md
├── batch_feature_extraction.py
├── calculate_dev_results_from_dcase_output.py
├── cls_data_generator.py
├── cls_feature_class.py
├── display_specs.py
├── fold1_room1_mix050_ov2.png
├── gammatone
├── COPYING
├── README.md
├── auditory_toolkit
│ ├── COPYING
│ ├── ERBFilterBank.m
│ ├── ERBSpace.m
│ ├── MakeERBFilters.m
│ ├── README
│ ├── demo_gammatone.m
│ ├── fft2gammatonemx.m
│ ├── gammatone_demo.m
│ ├── gammatonegram.m
│ └── specgram.m
├── doc
│ ├── FurElise.png
│ ├── Makefile
│ ├── conf.py
│ ├── details.rst
│ ├── fftweight.rst
│ ├── filters.rst
│ ├── gtgram.rst
│ ├── index.rst
│ ├── make.bat
│ └── plot.rst
├── gammatone
│ ├── __init__.py
│ ├── __main__.py
│ ├── fftweight.py
│ ├── filters.py
│ ├── gtgram.py
│ └── plot.py
├── setup.py
├── test_generation
│ ├── README
│ ├── test_ERBFilterBank.m
│ ├── test_ERBSpace.m
│ ├── test_MakeERBFilters.m
│ ├── test_fft2gammatonemx.m
│ ├── test_fft_gammatonegram.m
│ ├── test_gammatonegram.m
│ └── test_specgram.m
└── tests
│ ├── __init__.py
│ ├── data
│ ├── test_erb_filter_data.mat
│ ├── test_erbspace_data.mat
│ ├── test_fft2gtmx_data.mat
│ ├── test_fft_gammatonegram_data.mat
│ ├── test_filterbank_data.mat
│ ├── test_gammatonegram_data.mat
│ └── test_specgram_data.mat
│ ├── test_cfs.py
│ ├── test_erb_space.py
│ ├── test_fft_gtgram.py
│ ├── test_fft_weights.py
│ ├── test_filterbank.py
│ ├── test_gammatone_filters.py
│ ├── test_gammatonegram.py
│ └── test_specgram.py
├── images
├── CRNN_SELDT_DCASE2020.png
├── SELDnet_output.jpg
├── scse_cropped.pdf
├── seld-squeeze-structure.pdf
└── seld_squeeze_structure_image.jpg
├── keras_model.py
├── metrics
├── LICENSE.md
├── SELD_evaluation_metrics.py
└── evaluation_metrics.py
├── parameter.py
├── seld.py
└── visualize_SELD_output.py
/.gitignore:
--------------------------------------------------------------------------------
1 | /home/jose/DCASE2020_Task3/base_folder/*
2 | /home/jose/DCASE2020_Task3/input_feature/*
3 | base_folder/*
4 | input_feature/*
5 | .vscode/*
6 | __pycache__/*
7 | metrics/__pycache__/*
8 | gammatone/gammatone/__pycache__/*
9 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
2 | Copyright (c) 2020 Tampere University and its licensors
3 | All rights reserved.
4 |
5 | Permission is hereby granted, without written agreement and without
6 | license or royalty fees, to use and copy the code for the Sound Event
7 | Localization and Detection using Convolutional Recurrent Neural Network
8 | method/architecture, present in the GitHub repository with the handle
9 | seld-dcase2020, (“Work”) described in the paper with title "Sound event
10 | localization and detection of overlapping sources using
11 | convolutional recurrent neural network" and composed of files with
12 | code in the Python programming language. This grant is only for experimental and
13 | non-commercial purposes, provided that the copyright notice in its entirety
14 | appear in all copies of this Work, and the original source of this Work,
15 | Audio Research Group at Tampere University, is acknowledged in any publication
16 | that reports research using this Work.
17 |
18 | Any commercial use of the Work or any part thereof is strictly prohibited.
19 | Commercial use include, but is not limited to:
20 | - selling or reproducing the Work
21 | - selling or distributing the results or content achieved by use of the Work
22 | - providing services by using the Work.
23 |
24 | IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO
25 | ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
26 | ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE
27 | UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY
28 | OF SUCH DAMAGE.
29 |
30 | TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS
31 | ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
32 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER
33 | IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION
34 | TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
35 |
36 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
37 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # DCASE 2020: SELD using squeeze-excitation residual networks
3 | [Please visit the official webpage of the DCASE 2020 Challenge for comparison with other submissions](http://dcase.community/challenge2020/task-sound-event-localization-and-detection-results).
4 |
5 | The main objective of this submission was to study how squeeze-excitation techniques can improve the behavior of sound event detection and localization (SELD) systems. To do so, we start from the network presented as a baseline consisting of a CRNN and replace the convolutional layers by Conv-StandardPOST blocks. This block was presented in:
6 |
7 | > Naranjo-Alcazar, J., Perez-Castanos, S., Zuccarello, P., & Cobos, M. (2020). Acoustic Scene Classification with Squeeze-Excitation Residual Networks. IEEE Access.
8 |
9 | This repo implementation is presented in:
10 |
11 | > Naranjo-Alcazar, Javier, et al. "Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs." arXiv preprint arXiv:2006.14436 (2020).
12 |
13 | Please consider citing these works if the code or something presented in them has been used.
14 |
15 | ## BASELINE METHOD
16 |
17 | In comparison to the SELDnet studied in the papers above, we have changed the following to improve its performance and evaluate the performance better.
18 | * **Features**: The original SELDnet employed naive phase and magnitude components of the spectrogram as the input feature for all input formats of audio. In this baseline method, we use separate features for first-order Ambisonic (FOA) and microphone array (MIC) datasets. As the interaural level difference feature, we employ the 64-band mel energies extracted from each channel of the input audio for both FOA and MIC. To encode the interaural time difference features, we employ intensity vector features for FOA, and generalized cross correlation features for MIC.
19 | * **Loss/Objective**: The original SELDnet employed mean square error (MSE) for the DOA loss estimation, and this was computed irrespecitve of the presence or absence of the sound event. In the current baseline, we used a masked-MSE, which computes MSE only when the sound event is active in the reference.
20 | * **Evaluation metrics**: The performance of the original SELDnet was evaluated with stand-alone metrics for detection, and localization. Mainly because there was no suitable metric which could jointly evaluate the performance of localization and detection. Since then, we have proposed a new metric that can jointly evaluate the performance (more about it is described in the metrics section below), and we employ this new metric for evaluation here.
21 |
22 | The final SELDnet architecture is as shown below. The input is the multichannel audio, from which the different acoustic features are extracted based on the input format of the audio. Based on the chosen dataset (FOA or MIC), the baseline method takes a sequence of consecutive feature-frames and predicts all the active sound event classes for each of the input frame along with their respective spatial location, producing the temporal activity and DOA trajectory for each sound event class. In particular, a convolutional recurrent neural network (CRNN) is used to map the frame sequence to the two outputs in parallel. At the first output, SED is performed as a multi-label multi-class classification task, allowing the network to simultaneously estimate the presence of multiple sound events for each frame. At the second output, DOA estimates in the continuous 3D space are obtained as a multi-output regression task, where each sound event class is associated with three regressors that estimate the Cartesian coordinates x, y and z axes of the DOA on a unit sphere around the microphone.
23 |
24 |
25 |
26 |
27 |
28 | The SED output of the network is in the continuous range of [0 1] for each sound event in the dataset, and this value is thresholded to obtain a binary decision for the respective sound event activity. Finally, the respective DOA estimates for these active sound event classes provide their spatial locations.
29 |
30 | ## SUBMISSION MODIFICATION
31 |
32 | This image shows the submission architecture:
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 | ## DATASET
41 |
42 | The dataset used has been:
43 |
44 | * **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array**
45 |
46 | **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array** provides four-channel directional microphone recordings from a tetrahedral array configuration. This format is extracted from the same microphone array, and additional information on the spatial characteristics of each format can be found below. This dataset consists of a development and evaluation set. The development set consists of 600, one minute long recordings sampled at 24000 Hz. We use 400 recordings for training split (fold 3 to 6), 100 for validation (fold 2) and 100 for testing (fold 1). The evaluation set consists of 200, one-minute recordings, and will be released at a later point.
47 |
48 | More details on the recording procedure and dataset can be read on the [DCASE 2020 task webpage](http://dcase.community/challenge2020/task-sound-event-localization-and-detection).
49 |
50 | The two development datasets can be downloaded from the link - [**TAU-NIGENS Spatial Sound Events 2020 - Ambisonic and Microphone Array**, Development dataset](https://doi.org/10.5281/zenodo.3740236) [](https://doi.org/10.5281/zenodo.3740236)
51 |
52 |
53 | ## Getting Started
54 |
55 | This repository consists of multiple Python scripts forming one big architecture used to train the SELDnet.
56 | * The `batch_feature_extraction.py` is a standalone wrapper script, that extracts the features, labels, and normalizes the training and test split features for a given dataset. Make sure you update the location of the downloaded datasets before.
57 | * The `parameter.py` script consists of all the training, model, and feature parameters. If a user has to change some parameters, they have to create a sub-task with unique id here. Check code for examples.
58 | * The `cls_feature_class.py` script has routines for labels creation, features extraction and normalization.
59 | * The `cls_data_generator.py` script provides feature + label data in generator mode for training.
60 | * The `keras_model.py` script implements the SELDnet architecture.
61 | * The `evaluation_metrics.py` script implements the core metrics from sound event detection evaluation module http://tut-arg.github.io/sed_eval/ and the DOA metrics explained in the paper. These were used in the DCASE 2019 SELD task. We use this here to just for legacy comparison
62 | * The `SELD_evaluation_metrics.py` script implements the metrics for joint evaluation of detection and localization.
63 | * The `seld.py` is a wrapper script that trains the SELDnet. The training stops when the SELD error (check paper) stops improving.
64 |
65 | Additionally, we also provide supporting scripts that help analyse the results.
66 | * `visualize_SELD_output.py` script to visualize the SELDnet output
67 |
68 |
69 | ### Prerequisites
70 |
71 | The provided codebase has been tested on python 3.6.9/3.7.3 and Keras 2.2.4/2.3.1
72 |
73 |
74 | ### Training the SELDnet
75 |
76 | In order to quickly train SELDnet follow the steps below.
77 |
78 | * For the chosen dataset (Ambisonic or Microphone), download the respective zip file. This contains both the audio files and the respective metadata. Unzip the files under the same 'base_folder/', ie, if you are Ambisonic dataset, then the 'base_folder/' should have two folders - 'foa_dev/' and 'metadata_dev/' after unzipping.
79 |
80 | * Now update the respective dataset name and its path in `parameter.py` script. For the above example, you will change `dataset='foa'` and `dataset_dir='base_folder/'`. Also provide a directory path `feat_label_dir` in the same `parameter.py` script where all the features and labels will be dumped.
81 |
82 | * Extract features from the downloaded dataset by running the `batch_feature_extraction.py` script. Run the script as shown below. This will dump the normalized features and labels in the `feat_label_dir` folder.
83 |
84 | ```
85 | python3 batch_feature_extraction.py
86 | ```
87 |
88 | You can now train the SELDnet using this subimssion modifications. Parameters that MUST be indicated are --baseline and --ratio
89 | ```python
90 | python3 seld.py --baseline False --ratio 4
91 | ```
92 |
93 | executes ConvStandard modules with ratio =4. If you want to execute the baseline code, set --baseline to True. If want to execute residual learning without squeeze-excitation:
94 |
95 | ```python
96 | python3 seld.py --baseline False --ratio 0
97 | ```
98 |
99 |
100 | * By default, the code runs in `quick_test = False` mode. Setting `quick_test = True` in `parameter.py` trains the network for 2 epochs on only 2 mini-batches.
101 |
102 | * The code also plots training curves, intermediate results and saves models in the `model_dir` path provided by the user in `parameter.py` file.
103 |
104 | * In order to visualize the output of SELDnet and for submission of results, set `dcase_output=True` and provide `dcase_dir` directory. This will dump file-wise results in the directory, which can be individually visualized using `visualize_SELD_output.py` script.
105 |
106 | ## Results on development dataset (baseline)
107 |
108 | As the evaluation metrics we use two different approaches as discussed in our recent paper below
109 |
110 | > Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, and Tuomas Virtanen. Joint measurement of localization and detection of sound events. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, Oct 2019.
111 |
112 | The first metric is more focused on the detection part, also referred as the location-aware detection, which gives us the error rate (ER) and F-score (F) in one-second non-overlapping segments. We consider the prediction to be correct if the prediction and reference class are the same, and the distance between them is below 20°.
113 | The second metric is more focused on the localization part, also referred as the class-aware localization, which gives us the DOA error (DE), and F-score (DE_F) in one-second non-overlapping segments. Unlike the location-aware detection, we do not use any distance threshold, but estimate the distance between the correct prediction and reference.
114 |
115 | The evaluation metric scores for the test split of the development dataset is given below
116 |
117 | ### Baseline results
118 |
119 | | Dataset | ER | F | DE | DE_F |
120 | | ----| --- | --- | --- | --- |
121 | | Microphone Array (MIC) | 0.78 | 31.4 % | 27.3° | 59.0 % |
122 |
123 |
124 | **Note:** The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able to obtain very similar results.
125 |
126 | ## Submission results
127 |
128 | ### Development stage (results on testing folder)
129 |
130 | set mode to dev
131 |
132 | | ratio | ER | F | DE | DE_F |
133 | | ----| --- | --- | --- | --- |
134 | | 0 | 0.68 | 42.3 | 22.5 | 65.1 |
135 | | 1 | 0.70 | 39.2 | 23.5 | 63.6 |
136 | | 2 | 0.69 | 40.4 | 23.2 | 62.1 |
137 | | 4 | 0.68 | 40.9 | 23.3 | 65.0 |
138 | | 8 | 0.69 | 40.8 | 23.5 | 63.8 |
139 | | 16 | 0.69 | 40.7 | 23.3 | 62.8
140 |
141 | ### Challenge results
142 |
143 | * The team submission ranked 11/15
144 |
145 | * Best system ranked, ratio = 1, 30/43
146 |
147 | | ratio | ER | F | DE | DE_F |
148 | | :----:| --- | --- | --- | --- |
149 | | organization baseline | 0.70 | 39.5 | 23.2 | 62.1 |
150 | | 0 | 0.61 | 48.3 | 19.2 | 65.9 |
151 | | 1 | 0.61 | 49.1 | 19.5 | 67.1 |
152 | | 8 | 0.64 | 46.7 | 20.0 | 64.5 |
153 | | 16 | 0.63 | 47.3 | 19.5 | 65.5 |
154 |
155 |
156 |
157 |
--------------------------------------------------------------------------------
/batch_feature_extraction.py:
--------------------------------------------------------------------------------
1 | # Extracts the features, labels, and normalizes the development and evaluation split features.
2 |
3 | import cls_feature_class
4 | import parameter
5 |
6 | process_str = 'eval' #, eval' # 'dev' or 'eval' will extract features for the respective set accordingly
7 | # 'dev, eval' will extract features of both sets together
8 |
9 | params = parameter.get_params()
10 |
11 |
12 | if 'dev' in process_str:
13 | # -------------- Extract features and labels for development set -----------------------------
14 | dev_feat_cls = cls_feature_class.FeatureClass(params, is_eval=False)
15 |
16 | # Extract features and normalize them
17 | dev_feat_cls.extract_all_feature()
18 | dev_feat_cls.preprocess_features()
19 |
20 | # # Extract labels in regression mode
21 | dev_feat_cls.extract_all_labels()
22 |
23 |
24 | if 'eval' in process_str:
25 | # -----------------------------Extract ONLY features for evaluation set-----------------------------
26 | eval_feat_cls = cls_feature_class.FeatureClass(params, is_eval=True)
27 |
28 | # Extract features and normalize them
29 | eval_feat_cls.extract_all_feature()
30 | eval_feat_cls.preprocess_features()
31 |
32 |
--------------------------------------------------------------------------------
/calculate_dev_results_from_dcase_output.py:
--------------------------------------------------------------------------------
1 | import os
2 | from metrics import SELD_evaluation_metrics
3 | import cls_feature_class
4 | import parameter
5 | import numpy as np
6 |
7 |
8 | def get_nb_files(_pred_file_list, _group='split'):
9 | _group_ind = {'ir': 4, 'ov': 21}
10 | _cnt_dict = {}
11 | for _filename in _pred_file_list:
12 |
13 | if _group == 'all':
14 | _ind = 0
15 | else:
16 | _ind = int(_filename[_group_ind[_group]])
17 |
18 | if _ind not in _cnt_dict:
19 | _cnt_dict[_ind] = []
20 | _cnt_dict[_ind].append(_filename)
21 |
22 | return _cnt_dict
23 |
24 |
25 | # --------------------------- MAIN SCRIPT STARTS HERE -------------------------------------------
26 |
27 |
28 | # INPUT DIRECTORY
29 | ref_desc_files = '/scratch/asignal/sharath/DCASE2020_SELD_dataset/metadata_dev' # reference description directory location
30 | pred_output_format_files = 'results/2_mic_dev' # predicted output format directory location
31 | use_polar_format = True # Compute SELD metrics using polar or Cartesian coordinates
32 |
33 | # Load feature class
34 | params = parameter.get_params()
35 | feat_cls = cls_feature_class.FeatureClass(params)
36 |
37 | # collect reference files info
38 | ref_files = os.listdir(ref_desc_files)
39 | nb_ref_files = len(ref_files)
40 |
41 | # collect predicted files info
42 | pred_files = os.listdir(pred_output_format_files)
43 | nb_pred_files = len(pred_files)
44 |
45 | # Calculate scores for different splits, overlapping sound events, and impulse responses (reverberant scenes)
46 | score_type_list = ['all', 'ov', 'ir']
47 | print('Number of predicted files: {}\nNumber of reference files: {}'.format(nb_pred_files, nb_ref_files))
48 | print('\nCalculating {} scores for {}'.format(score_type_list, os.path.basename(pred_output_format_files)))
49 |
50 | for score_type in score_type_list:
51 | print('\n\n---------------------------------------------------------------------------------------------------')
52 | print('------------------------------------ {} ---------------------------------------------'.format('Total score' if score_type=='all' else 'score per {}'.format(score_type)))
53 | print('---------------------------------------------------------------------------------------------------')
54 |
55 | split_cnt_dict = get_nb_files(pred_files, _group=score_type) # collect files corresponding to score_type
56 | # Calculate scores across files for a given score_type
57 | for split_key in np.sort(list(split_cnt_dict)):
58 | # Load evaluation metric class
59 | eval = SELD_evaluation_metrics.SELDMetrics(nb_classes=feat_cls.get_nb_classes(), doa_threshold=params['lad_doa_thresh'])
60 | for pred_cnt, pred_file in enumerate(split_cnt_dict[split_key]):
61 | # Load predicted output format file
62 | pred_dict = feat_cls.load_output_format_file(os.path.join(pred_output_format_files, pred_file))
63 | if use_polar_format:
64 | pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict)
65 | pred_labels = feat_cls.segment_labels(pred_dict_polar, feat_cls.get_nb_frames())
66 | else:
67 | pred_labels = feat_cls.segment_labels(pred_dict, feat_cls.get_nb_frames())
68 |
69 | # Load reference description file
70 | gt_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_desc_files, pred_file.replace('.npy', '.csv')))
71 | if use_polar_format:
72 | gt_labels = feat_cls.segment_labels(gt_dict_polar, feat_cls.get_nb_frames())
73 | else:
74 | gt_dict = feat_cls.convert_output_format_polar_to_cartesian(gt_dict_polar)
75 | gt_labels = feat_cls.segment_labels(gt_dict, feat_cls.get_nb_frames())
76 |
77 | # Calculated scores
78 | if use_polar_format:
79 | eval.update_seld_scores(pred_labels, gt_labels)
80 | else:
81 | eval.update_seld_scores_xyz(pred_labels, gt_labels)
82 |
83 |
84 | # Overall SED and DOA scores
85 | er, f, de, de_f = eval.compute_seld_scores()
86 | seld_scr = SELD_evaluation_metrics.early_stopping_metric([er, f], [de, de_f])
87 |
88 | print('\nAverage score for {} {} data using {} coordinates'.format(score_type, 'fold' if score_type=='all' else split_key, 'Polar' if use_polar_format else 'Cartesian' ))
89 | print('SELD score (early stopping metric): {:0.2f}'.format(seld_scr))
90 | print('SED metrics: Error rate: {:0.2f}, F-score:{:0.1f}'.format(er, 100*f))
91 | print('DOA metrics: DOA error: {:0.1f}, F-score:{:0.1f}'.format(de, 100*de_f))
92 |
--------------------------------------------------------------------------------
/cls_data_generator.py:
--------------------------------------------------------------------------------
1 | #
2 | # Data generator for training the SELDnet
3 | #
4 |
5 | import os
6 | import numpy as np
7 | import cls_feature_class
8 | from IPython import embed
9 | from collections import deque
10 | import random
11 |
12 |
13 | class DataGenerator(object):
14 | def __init__(
15 | self, params, split=1, shuffle=True, per_file=False, is_eval=False
16 | ):
17 | self._per_file = per_file
18 | self._is_eval = is_eval
19 | self._splits = np.array(split)
20 | self._batch_size = params['batch_size']
21 | self._feature_seq_len = params['feature_sequence_length']
22 | self._label_seq_len = params['label_sequence_length']
23 | self._shuffle = shuffle
24 | self._feat_cls = cls_feature_class.FeatureClass(params=params, is_eval=self._is_eval)
25 | self._label_dir = self._feat_cls.get_label_dir()
26 | self._feat_dir = self._feat_cls.get_normalized_feat_dir()
27 |
28 | self._filenames_list = list()
29 | self._nb_frames_file = 0 # Using a fixed number of frames in feat files. Updated in _get_label_filenames_sizes()
30 | self._nb_mel_bins = self._feat_cls.get_nb_mel_bins()
31 | self._nb_ch = None
32 | self._label_len = None # total length of label - DOA + SED
33 | self._doa_len = None # DOA label length
34 | self._class_dict = self._feat_cls.get_classes()
35 | self._nb_classes = self._feat_cls.get_nb_classes()
36 | self._get_filenames_list_and_feat_label_sizes()
37 |
38 | self._feature_batch_seq_len = self._batch_size*self._feature_seq_len
39 | self._label_batch_seq_len = self._batch_size*self._label_seq_len
40 | self._circ_buf_feat = None
41 | self._circ_buf_label = None
42 |
43 | if self._per_file:
44 | self._nb_total_batches = len(self._filenames_list)
45 | else:
46 | self._nb_total_batches = int(np.floor((len(self._filenames_list) * self._nb_frames_file /
47 | float(self._feature_batch_seq_len))))
48 |
49 | # self._dummy_feat_vec = np.ones(self._feat_len.shape) *
50 |
51 | print(
52 | '\tDatagen_mode: {}, nb_files: {}, nb_classes:{}\n'
53 | '\tnb_frames_file: {}, feat_len: {}, nb_ch: {}, label_len:{}\n'.format(
54 | 'eval' if self._is_eval else 'dev', len(self._filenames_list), self._nb_classes,
55 | self._nb_frames_file, self._nb_mel_bins, self._nb_ch, self._label_len
56 | )
57 | )
58 |
59 | print(
60 | '\tDataset: {}, split: {}\n'
61 | '\tbatch_size: {}, feat_seq_len: {}, label_seq_len: {}, shuffle: {}\n'
62 | '\tTotal batches in dataset: {}\n'
63 | '\tlabel_dir: {}\n '
64 | '\tfeat_dir: {}\n'.format(
65 | params['dataset'], split,
66 | self._batch_size, self._feature_seq_len, self._label_seq_len, self._shuffle,
67 | self._nb_total_batches,
68 | self._label_dir, self._feat_dir
69 | )
70 | )
71 |
72 | def get_data_sizes(self):
73 | feat_shape = (self._batch_size, self._nb_ch, self._feature_seq_len, self._nb_mel_bins)
74 | if self._is_eval:
75 | label_shape = None
76 | else:
77 | label_shape = [
78 | (self._batch_size, self._label_seq_len, self._nb_classes),
79 | (self._batch_size, self._label_seq_len, self._nb_classes*3)
80 | ]
81 | return feat_shape, label_shape
82 |
83 | def get_total_batches_in_data(self):
84 | return self._nb_total_batches
85 |
86 | def _get_filenames_list_and_feat_label_sizes(self):
87 |
88 | for filename in os.listdir(self._feat_dir):
89 | if self._is_eval:
90 | self._filenames_list.append(filename)
91 | else:
92 | if int(filename[4]) in self._splits: # check which split the file belongs to
93 | self._filenames_list.append(filename)
94 |
95 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[0]))
96 | self._nb_frames_file = temp_feat.shape[0]
97 | self._nb_ch = temp_feat.shape[1] // self._nb_mel_bins
98 |
99 | if not self._is_eval:
100 | temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[0]))
101 | self._label_len = temp_label.shape[-1]
102 | self._doa_len = (self._label_len - self._nb_classes)//self._nb_classes
103 |
104 | if self._per_file:
105 | self._batch_size = int(np.ceil(temp_feat.shape[0]/float(self._feature_seq_len)))
106 |
107 | return
108 |
109 | def generate(self):
110 | """
111 | Generates batches of samples
112 | :return:
113 | """
114 |
115 | while 1:
116 | if self._shuffle:
117 | random.shuffle(self._filenames_list)
118 |
119 | # Ideally this should have been outside the while loop. But while generating the test data we want the data
120 | # to be the same exactly for all epoch's hence we keep it here.
121 | self._circ_buf_feat = deque()
122 | self._circ_buf_label = deque()
123 |
124 | file_cnt = 0
125 | if self._is_eval:
126 | for i in range(self._nb_total_batches):
127 | # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the
128 | # circular buffer. If not keep refilling it.
129 | while len(self._circ_buf_feat) < self._feature_batch_seq_len:
130 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt]))
131 |
132 | for row_cnt, row in enumerate(temp_feat):
133 | self._circ_buf_feat.append(row)
134 |
135 | # If self._per_file is True, this returns the sequences belonging to a single audio recording
136 | if self._per_file:
137 | extra_frames = self._feature_batch_seq_len - temp_feat.shape[0]
138 | extra_feat = np.ones((extra_frames, temp_feat.shape[1])) * 1e-6
139 |
140 | for row_cnt, row in enumerate(extra_feat):
141 | self._circ_buf_feat.append(row)
142 |
143 | file_cnt = file_cnt + 1
144 |
145 | # Read one batch size from the circular buffer
146 | feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch))
147 | for j in range(self._feature_batch_seq_len):
148 | feat[j, :] = self._circ_buf_feat.popleft()
149 | feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch))
150 |
151 | # Split to sequences
152 | feat = self._split_in_seqs(feat, self._feature_seq_len)
153 | feat = np.transpose(feat, (0, 3, 1, 2))
154 |
155 | yield feat
156 |
157 | else:
158 | for i in range(self._nb_total_batches):
159 |
160 | # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the
161 | # circular buffer. If not keep refilling it.
162 | while len(self._circ_buf_feat) < self._feature_batch_seq_len:
163 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt]))
164 | temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[file_cnt]))
165 |
166 | for f_row in temp_feat:
167 | self._circ_buf_feat.append(f_row)
168 | for l_row in temp_label:
169 | self._circ_buf_label.append(l_row)
170 |
171 | # If self._per_file is True, this returns the sequences belonging to a single audio recording
172 | if self._per_file:
173 | feat_extra_frames = self._feature_batch_seq_len - temp_feat.shape[0]
174 | extra_feat = np.ones((feat_extra_frames, temp_feat.shape[1])) * 1e-6
175 |
176 | label_extra_frames = self._label_batch_seq_len - temp_label.shape[0]
177 | extra_labels = np.zeros((label_extra_frames, temp_label.shape[1]))
178 |
179 | for f_row in extra_feat:
180 | self._circ_buf_feat.append(f_row)
181 | for l_row in extra_labels:
182 | self._circ_buf_label.append(l_row)
183 |
184 | file_cnt = file_cnt + 1
185 |
186 | # Read one batch size from the circular buffer
187 | feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch))
188 | label = np.zeros((self._label_batch_seq_len, self._label_len))
189 | for j in range(self._feature_batch_seq_len):
190 | feat[j, :] = self._circ_buf_feat.popleft()
191 | for j in range(self._label_batch_seq_len):
192 | label[j, :] = self._circ_buf_label.popleft()
193 | feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch))
194 |
195 | # Split to sequences
196 | feat = self._split_in_seqs(feat, self._feature_seq_len)
197 | feat = np.transpose(feat, (0, 3, 1, 2))
198 | label = self._split_in_seqs(label, self._label_seq_len)
199 |
200 | label = [
201 | label[:, :, :self._nb_classes], # SED labels
202 | label # SED + DOA labels
203 | ]
204 | yield feat, label
205 |
206 | def _split_in_seqs(self, data, _seq_len):
207 | if len(data.shape) == 1:
208 | if data.shape[0] % _seq_len:
209 | data = data[:-(data.shape[0] % _seq_len), :]
210 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, 1))
211 | elif len(data.shape) == 2:
212 | if data.shape[0] % _seq_len:
213 | data = data[:-(data.shape[0] % _seq_len), :]
214 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1]))
215 | elif len(data.shape) == 3:
216 | if data.shape[0] % _seq_len:
217 | data = data[:-(data.shape[0] % _seq_len), :, :]
218 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1], data.shape[2]))
219 | else:
220 | print('ERROR: Unknown data dimensions: {}'.format(data.shape))
221 | exit()
222 | return data
223 |
224 | @staticmethod
225 | def split_multi_channels(data, num_channels):
226 | tmp = None
227 | in_shape = data.shape
228 | if len(in_shape) == 3:
229 | hop = in_shape[2] / num_channels
230 | tmp = np.zeros((in_shape[0], num_channels, in_shape[1], hop))
231 | for i in range(num_channels):
232 | tmp[:, i, :, :] = data[:, :, i * hop:(i + 1) * hop]
233 | elif len(in_shape) == 4 and num_channels == 1:
234 | tmp = np.zeros((in_shape[0], 1, in_shape[1], in_shape[2], in_shape[3]))
235 | tmp[:, 0, :, :, :] = data
236 | else:
237 | print('ERROR: The input should be a 3D matrix but it seems to have dimensions: {}'.format(in_shape))
238 | exit()
239 | return tmp
240 |
241 | def get_default_elevation(self):
242 | return self._default_ele
243 |
244 | def get_azi_ele_list(self):
245 | return self._feat_cls.get_azi_ele_list()
246 |
247 | def get_nb_classes(self):
248 | return self._nb_classes
249 |
250 | def nb_frames_1s(self):
251 | return self._feat_cls.nb_frames_1s()
252 |
253 | def get_hop_len_sec(self):
254 | return self._feat_cls.get_hop_len_sec()
255 |
256 | def get_classes(self):
257 | return self._feat_cls.get_classes()
258 |
259 | def get_filelist(self):
260 | return self._filenames_list
261 |
262 | def get_frame_per_file(self):
263 | return self._label_batch_seq_len
264 |
265 | def get_nb_frames(self):
266 | return self._feat_cls.get_nb_frames()
267 |
268 | def get_data_gen_mode(self):
269 | return self._is_eval
270 |
271 | def write_output_format_file(self, _out_file, _out_dict):
272 | return self._feat_cls.write_output_format_file(_out_file, _out_dict)
--------------------------------------------------------------------------------
/display_specs.py:
--------------------------------------------------------------------------------
1 | import librosa.display
2 | import numpy as np
3 | import matplotlib.pyplot as plt
4 |
5 |
6 |
7 | gamma = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy')
8 |
9 | gamma_ch1 = gamma[:,0:64]
10 |
11 | plt.subplot(2, 2, 1)
12 | gamma_ch1 = gamma_ch1.T
13 | librosa.display.specshow(np.flip(gamma_ch1,1))
14 | plt.colorbar()
15 | plt.title('fold1_room1_mix001_ov1 gammatone scale to max')
16 |
17 | gamma_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy')
18 |
19 | gamma_norm_ch1 = gamma_norm[:,0:64]
20 | #gamma_norm_ch1 = gamma_norm_ch1.T
21 | plt.subplot(2, 2, 2)
22 | librosa.display.specshow(gamma_norm_ch1.T)
23 | plt.colorbar()
24 | plt.title('fold1_room1_mix001_ov1 gammatone no scale to max')
25 |
26 | spec = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev/fold1_room1_mix001_ov1.npy')
27 |
28 | spec_ch1 = spec[:,0:64]
29 |
30 | plt.subplot(2, 2, 3)
31 | #spec_ch1 = spec_ch1.T
32 | librosa.display.specshow(spec_ch1.T)
33 | plt.colorbar()
34 | plt.title('fold1_room1_mix001_ov1 mel spectrogram')
35 |
36 | spec_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev_norm/fold1_room1_mix001_ov1.npy')
37 |
38 | spec_norm_ch1 = spec_norm[:,0:64]
39 |
40 | plt.subplot(2, 2, 4)
41 | librosa.display.specshow(spec_norm_ch1.T)
42 | plt.colorbar()
43 | plt.title('fold1_room1_mix001_ov1 mel norm spectrogram')
44 |
45 | plt.show()
--------------------------------------------------------------------------------
/fold1_room1_mix050_ov2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/fold1_room1_mix050_ov2.png
--------------------------------------------------------------------------------
/gammatone/COPYING:
--------------------------------------------------------------------------------
1 | Copyright (c) 1998, Malcolm Slaney
2 | Copyright (c) 2009, Dan Ellis
3 | Copyright (c) 2014, Jason Heeris
4 | All rights reserved.
5 |
6 | Redistribution and use in source and binary forms, with or without
7 | modification, are permitted provided that the following conditions are met:
8 | * Redistributions of source code must retain the above copyright
9 | notice, this list of conditions and the following disclaimer.
10 | * Redistributions in binary form must reproduce the above copyright
11 | notice, this list of conditions and the following disclaimer in the
12 | documentation and/or other materials provided with the distribution.
13 | * Neither the name of the copyright holder nor the names of its contributors
14 | may be used to endorse or promote products derived from this software
15 | without specific prior written permission.
16 |
17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
18 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
19 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
20 | DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY
21 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
22 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
23 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
24 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 |
--------------------------------------------------------------------------------
/gammatone/README.md:
--------------------------------------------------------------------------------
1 | Gammatone Filterbank Toolkit
2 | ============================
3 |
4 | *Utilities for analysing sound using perceptual models of human hearing.*
5 |
6 | Jason Heeris, 2013
7 |
8 | Summary
9 | -------
10 |
11 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank MATLAB
12 | code, detailed below, to Python 2 and 3 using Numpy and Scipy. It analyses signals by
13 | running them through banks of gammatone filters, similar to Fourier-based
14 | spectrogram analysis.
15 |
16 | 
17 |
18 | Installation
19 | ------------
20 |
21 | You can install directly from this git repository using:
22 |
23 | ```text
24 | pip install git+https://github.com/detly/gammatone.git
25 | ```
26 |
27 | ...or you can clone the git repository however you prefer, and do:
28 |
29 | ```text
30 | pip install .
31 | ```
32 |
33 | ...or:
34 |
35 | ```
36 | python setup.py install
37 | ```
38 |
39 | ...from the cloned tree.
40 |
41 | ### Dependencies
42 |
43 | - numpy
44 | - scipy
45 | - nose
46 | - mock
47 | - matplotlib
48 |
49 | Using the Code
50 | --------------
51 |
52 | See the [API documentation](http://detly.github.io/gammatone/). For a
53 | demonstration, find a `.wav` file (for example,
54 | [Für Elise](http://heeris.id.au/samples/FurElise.wav)) and run:
55 |
56 | ```text
57 | python -m gammatone FurElise.wav -d 10
58 | ```
59 |
60 | ...to see a gammatone-gram of the first ten seconds of the track. If you've
61 | installed via `pip` or `setup.py install`, you should also be able to just run:
62 |
63 | ```text
64 | gammatone FurElise.wav -d 10
65 | ```
66 |
67 | Basis
68 | -----
69 |
70 | This project is based on research into how humans perceive audio, originally
71 | published by Malcolm Slaney:
72 |
73 | [Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report #1998-010,
74 | Interval Research Corporation, 1998.](
75 | http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
76 | )
77 |
78 | Slaney's report describes a way of modelling how the human ear perceives,
79 | emphasises and separates different frequencies of sound. A series of gammatone
80 | filters are constructed whose width increases with increasing centre frequency,
81 | and this bank of filters is applied to a time-domain signal. The result of this
82 | is a spectrum that should represent the human experience of sound better than,
83 | say, a Fourier-domain spectrum would.
84 |
85 | A gammatone filter has an impulse response that is a sine wave multiplied by a
86 | gamma distribution function. It is a common approach to modelling the auditory
87 | system.
88 |
89 | The gammatone filterbank approach can be considered analogous (but not
90 | equivalent) to a discrete Fourier transform where the frequency axis is
91 | logarithmic. For example, a series of notes spaced an octave apart would appear
92 | to be roughly linearly spaced; or a sound that was distributed across the same
93 | linear frequency range would appear to have more spread at lower frequencies.
94 |
95 | The real goal of this toolkit is to allow easy computation of the gammatone
96 | equivalent of a spectrogram — a time-varying spectrum of energy over audible
97 | frequencies based on a gammatone filterbank.
98 |
99 | Slaney demonstrated his research with an initial implementation in MATLAB. This
100 | implementation was later extended by Dan Ellis, who found a way to approximate a
101 | "gammatone-gram" by using the fast Fourier transform. Ellis' code calculates a
102 | matrix of weights that can be applied to the output of a FFT so that a
103 | Fourier-based spectrogram can easily be transformed into such an approximation.
104 |
105 | Ellis' code and documentation is here: [Gammatone-like spectrograms](
106 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/
107 | )
108 |
109 | Interest
110 | --------
111 |
112 | I became interested in this because of my background in science communication
113 | and my general interest in the teaching of signal processing. I find that the
114 | spectrogram approach to visualising signals is adequate for illustrating
115 | abstract systems or the mathematical properties of transforms, but bears little
116 | correspondence to a person's own experience of sound. If someone wants to see
117 | what their favourite piece of music "looks like," a normal Fourier transform
118 | based spectrogram is actually quite a poor way to visualise it. Features of the
119 | audio seem to be oddly spaced or unnaturally emphasised or de-emphasised
120 | depending on where they are in the frequency domain.
121 |
122 | The gammatone filterbank approach seems to be closer to what someone might
123 | intuitively expect a visualisation of sound to look like, and can help develop
124 | an intuition about alternative representations of signals.
125 |
126 | Verifying the port
127 | ------------------
128 |
129 | Since this is a port of existing MATLAB code, I've written tests to verify the
130 | Python implementation against the original code. These tests aren't unit tests,
131 | but they do generally test single functions. Running the tests has the same
132 | workflow:
133 |
134 | 1. Run the scripts in the `test_generation` directory. This will create a
135 | `.mat` file containing test data in `tests/data`.
136 |
137 | 2. Run `nosetest3` in the top level directory. This will find and run all the
138 | tests in the `tests` directory.
139 |
140 | Although I'm usually loathe to check in generated files to version control, I'm
141 | willing to make an exception for the `.mat` files containing the test data. My
142 | reasoning is that they represent the decoupling of my code from the MATLAB code,
143 | and if the two projects were separated, they would be considered a part of the
144 | Python code, not the original MATLAB code.
145 |
146 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/COPYING:
--------------------------------------------------------------------------------
1 | Copyright (c) 1998, Malcolm Slaney
2 | Copyright (c) 2009, Dan Ellis
3 | All rights reserved.
4 |
5 | Redistribution and use in source and binary forms, with or without
6 | modification, are permitted provided that the following conditions are met:
7 | * Redistributions of source code must retain the above copyright
8 | notice, this list of conditions and the following disclaimer.
9 | * Redistributions in binary form must reproduce the above copyright
10 | notice, this list of conditions and the following disclaimer in the
11 | documentation and/or other materials provided with the distribution.
12 | * Neither the name of the copyright holder nor the names of its contributors
13 | may be used to endorse or promote products derived from this software
14 | without specific prior written permission.
15 |
16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY
20 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
23 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/ERBFilterBank.m:
--------------------------------------------------------------------------------
1 | function output = ERBFilterBank(x, fcoefs)
2 | % function output = ERBFilterBank(x, fcoefs)
3 | % Process an input waveform with a gammatone filter bank. This function
4 | % takes a single sound vector, and returns an array of filter outputs, one
5 | % channel per row.
6 | %
7 | % The fcoefs parameter, which completely specifies the Gammatone filterbank,
8 | % should be designed with the MakeERBFilters function. If it is omitted,
9 | % the filter coefficients are computed for you assuming a 22050Hz sampling
10 | % rate and 64 filters regularly spaced on an ERB scale from fs/2 down to 100Hz.
11 | %
12 |
13 | % Malcolm Slaney @ Interval, June 11, 1998.
14 | % (c) 1998 Interval Research Corporation
15 | % Thanks to Alain de Cheveigne' for his suggestions and improvements.
16 |
17 | if nargin < 1
18 | error('Syntax: output_array = ERBFilterBank(input_vector[, fcoefs]);');
19 | end
20 |
21 | if nargin < 2
22 | fcoefs = MakeERBFilters(22050,64,100);
23 | end
24 |
25 | if size(fcoefs,2) ~= 10
26 | error('fcoefs parameter passed to ERBFilterBank is the wrong size.');
27 | end
28 |
29 | if size(x,2) < size(x,1)
30 | x = x';
31 | end
32 |
33 | A0 = fcoefs(:,1);
34 | A11 = fcoefs(:,2);
35 | A12 = fcoefs(:,3);
36 | A13 = fcoefs(:,4);
37 | A14 = fcoefs(:,5);
38 | A2 = fcoefs(:,6);
39 | B0 = fcoefs(:,7);
40 | B1 = fcoefs(:,8);
41 | B2 = fcoefs(:,9);
42 | gain= fcoefs(:,10);
43 |
44 | output = zeros(size(gain,1), length(x));
45 | for chan = 1: size(gain,1)
46 | y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ...
47 | A2(chan)/gain(chan)], ...
48 | [B0(chan) B1(chan) B2(chan)], x);
49 | y2=filter([A0(chan) A12(chan) A2(chan)], ...
50 | [B0(chan) B1(chan) B2(chan)], y1);
51 | y3=filter([A0(chan) A13(chan) A2(chan)], ...
52 | [B0(chan) B1(chan) B2(chan)], y2);
53 | y4=filter([A0(chan) A14(chan) A2(chan)], ...
54 | [B0(chan) B1(chan) B2(chan)], y3);
55 | output(chan, :) = y4;
56 | end
57 |
58 | if 0
59 | semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(output))));
60 | end
61 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/ERBSpace.m:
--------------------------------------------------------------------------------
1 | function cfArray = ERBSpace(lowFreq, highFreq, N)
2 | % function cfArray = ERBSpace(lowFreq, highFreq, N)
3 | % This function computes an array of N frequencies uniformly spaced between
4 | % highFreq and lowFreq on an ERB scale. N is set to 100 if not specified.
5 | %
6 | % See also linspace, logspace, MakeERBCoeffs, MakeERBFilters.
7 | %
8 | % For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
9 | % "Suggested formulae for calculating auditory-filter bandwidths and
10 | % excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
11 |
12 | if nargin < 1
13 | lowFreq = 100;
14 | end
15 |
16 | if nargin < 2
17 | highFreq = 44100/4;
18 | end
19 |
20 | if nargin < 3
21 | N = 100;
22 | end
23 |
24 | % Change the following three parameters if you wish to use a different
25 | % ERB scale. Must change in MakeERBCoeffs too.
26 | EarQ = 9.26449; % Glasberg and Moore Parameters
27 | minBW = 24.7;
28 | order = 1;
29 |
30 | % All of the followFreqing expressions are derived in Apple TR #35, "An
31 | % Efficient Implementation of the Patterson-Holdsworth Cochlear
32 | % Filter Bank." See pages 33-34.
33 | cfArray = -(EarQ*minBW) + exp((1:N)'*(-log(highFreq + EarQ*minBW) + ...
34 | log(lowFreq + EarQ*minBW))/N) * (highFreq + EarQ*minBW);
35 |
36 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/MakeERBFilters.m:
--------------------------------------------------------------------------------
1 | function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq)
2 | % function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq)
3 | % This function computes the filter coefficients for a bank of
4 | % Gammatone filters. These filters were defined by Patterson and
5 | % Holdworth for simulating the cochlea.
6 | %
7 | % The result is returned as an array of filter coefficients. Each row
8 | % of the filter arrays contains the coefficients for four second order
9 | % filters. The transfer function for these four filters share the same
10 | % denominator (poles) but have different numerators (zeros). All of these
11 | % coefficients are assembled into one vector that the ERBFilterBank
12 | % can take apart to implement the filter.
13 | %
14 | % The filter bank contains "numChannels" channels that extend from
15 | % half the sampling rate (fs) to "lowFreq". Alternatively, if the numChannels
16 | % input argument is a vector, then the values of this vector are taken to
17 | % be the center frequency of each desired filter. (The lowFreq argument is
18 | % ignored in this case.)
19 |
20 | % Note this implementation fixes a problem in the original code by
21 | % computing four separate second order filters. This avoids a big
22 | % problem with round off errors in cases of very small cfs (100Hz) and
23 | % large sample rates (44kHz). The problem is caused by roundoff error
24 | % when a number of poles are combined, all very close to the unit
25 | % circle. Small errors in the eigth order coefficient, are multiplied
26 | % when the eigth root is taken to give the pole location. These small
27 | % errors lead to poles outside the unit circle and instability. Thanks
28 | % to Julius Smith for leading me to the proper explanation.
29 |
30 | % Execute the following code to evaluate the frequency
31 | % response of a 10 channel filterbank.
32 | % fcoefs = MakeERBFilters(16000,10,100);
33 | % y = ERBFilterBank([1 zeros(1,511)], fcoefs);
34 | % resp = 20*log10(abs(fft(y')));
35 | % freqScale = (0:511)/512*16000;
36 | % semilogx(freqScale(1:255),resp(1:255,:));
37 | % axis([100 16000 -60 0])
38 | % xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)');
39 |
40 | % Rewritten by Malcolm Slaney@Interval. June 11, 1998.
41 | % (c) 1998 Interval Research Corporation
42 |
43 | T = 1/fs;
44 | if length(numChannels) == 1
45 | cf = ERBSpace(lowFreq, fs/2, numChannels);
46 | else
47 | cf = numChannels(1:end);
48 | if size(cf,2) > size(cf,1)
49 | cf = cf';
50 | end
51 | end
52 |
53 | % Change the followFreqing three parameters if you wish to use a different
54 | % ERB scale. Must change in ERBSpace too.
55 | EarQ = 9.26449; % Glasberg and Moore Parameters
56 | minBW = 24.7;
57 | order = 1;
58 |
59 | ERB = ((cf/EarQ).^order + minBW^order).^(1/order);
60 | B=1.019*2*pi*ERB;
61 |
62 | A0 = T;
63 | A2 = 0;
64 | B0 = 1;
65 | B1 = -2*cos(2*cf*pi*T)./exp(B*T);
66 | B2 = exp(-2*B*T);
67 |
68 | A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ...
69 | exp(B*T))/2;
70 | A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ...
71 | exp(B*T))/2;
72 | A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ...
73 | exp(B*T))/2;
74 | A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ...
75 | exp(B*T))/2;
76 |
77 | gain = abs((-2*exp(4*i*cf*pi*T)*T + ...
78 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
79 | (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ...
80 | sin(2*cf*pi*T))) .* ...
81 | (-2*exp(4*i*cf*pi*T)*T + ...
82 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
83 | (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ...
84 | sin(2*cf*pi*T))).* ...
85 | (-2*exp(4*i*cf*pi*T)*T + ...
86 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
87 | (cos(2*cf*pi*T) - ...
88 | sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ...
89 | (-2*exp(4*i*cf*pi*T)*T + 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ...
90 | (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ...
91 | (-2 ./ exp(2*B*T) - 2*exp(4*i*cf*pi*T) + ...
92 | 2*(1 + exp(4*i*cf*pi*T))./exp(B*T)).^4);
93 |
94 | allfilts = ones(length(cf),1);
95 | fcoefs = [A0*allfilts A11 A12 A13 A14 A2*allfilts B0*allfilts B1 B2 gain];
96 |
97 | if (0) % Test Code
98 | A0 = fcoefs(:,1);
99 | A11 = fcoefs(:,2);
100 | A12 = fcoefs(:,3);
101 | A13 = fcoefs(:,4);
102 | A14 = fcoefs(:,5);
103 | A2 = fcoefs(:,6);
104 | B0 = fcoefs(:,7);
105 | B1 = fcoefs(:,8);
106 | B2 = fcoefs(:,9);
107 | gain= fcoefs(:,10);
108 | chan=1;
109 | x = [1 zeros(1, 511)];
110 | y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ...
111 | A2(chan)/gain(chan)],[B0(chan) B1(chan) B2(chan)], x);
112 | y2=filter([A0(chan) A12(chan) A2(chan)], ...
113 | [B0(chan) B1(chan) B2(chan)], y1);
114 | y3=filter([A0(chan) A13(chan) A2(chan)], ...
115 | [B0(chan) B1(chan) B2(chan)], y2);
116 | y4=filter([A0(chan) A14(chan) A2(chan)], ...
117 | [B0(chan) B1(chan) B2(chan)], y3);
118 | semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(y4))));
119 | end
120 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/README:
--------------------------------------------------------------------------------
1 | These files are the original auditory toolkit/gammatone filterbank code created
2 | by Malcolm Slaney and Dan Ellis, published at:
3 |
4 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/
5 | https://engineering.purdue.edu/~malcolm/interval/1998-010/
6 |
7 | Any non-code assets (ie. the sample WAV file and associated graphs) have been
8 | removed.
9 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/demo_gammatone.m:
--------------------------------------------------------------------------------
1 | %% Gammatone-like spectrograms
2 | % Gammatone filters are a popular linear approximation to the
3 | % filtering performed by the ear. This routine provides a simple
4 | % wrapper for generating time-frequency surfaces based on a
5 | % gammatone analysis, which can be used as a replacement for a
6 | % conventional spectrogram. It also provides a fast approximation
7 | % to this surface based on weighting the output of a conventional
8 | % FFT.
9 |
10 | %% Introduction
11 | % It is very natural to visualize sound as a time-varying
12 | % distribution of energy in frequency - not least because this is
13 | % one way of describing the information our brains get from our
14 | % ears via the auditory nerve. The spectrogram is the traditional
15 | % time-frequency visualization, but it actually has some important
16 | % differences from how sound is analyzed by the ear, most
17 | % significantly that the ear's frequency subbands get wider for
18 | % higher frequencies, whereas the spectrogram has a constant
19 | % bandwidth across all frequency channels.
20 | %
21 | % There have been many signal-processing approximations proposed
22 | % for the frequency analysis performed by the ear; one of the most
23 | % popular is the Gammatone filterbank originally proposed by
24 | % Roy Patterson and colleagues in 1992. Gammatone filters were
25 | % conceived as a simple fit to experimental observations of
26 | % the mammalian cochlea, and have a repeated pole structure leading
27 | % to an impulse response that is the product of a Gamma envelope
28 | % g(t) = t^n e^{-t} and a sinusoid (tone).
29 | %
30 | % One reason for the popularity of this approach is the
31 | % availability of an implementation by Malcolm Slaney, as
32 | % described in:
33 | %
34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2",
35 | % Technical Report #1998-010, Interval Research Corporation, 1998.
36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
37 | %
38 | % Malcolm's toolbox includes routines to design a Gammatone
39 | % filterbank and to process a signal by every filter in a bank,
40 | % but in order to convert this into a time-frequency visualization
41 | % it is necessary to sum up the energy within regular time bins.
42 | % While this is not complicated, the function here provides a
43 | % convenient wrapper to achieve this final step, for applications
44 | % that are content to work with time-frequency magnitude
45 | % distributions instead of going down to the waveform levels. In
46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters
47 | % and ERBFilterBank routines.
48 | %
49 | % This is, however, quite a computationally expensive approach, so
50 | % we also provide an alternative algorithm that gives very similar
51 | % results. In this mode, the Gammatone-based spectrogram is
52 | % constructed by first calculating a conventional, fixed-bandwidth
53 | % spectrogram, then combining the fine frequency resolution of the
54 | % FFT-based spectra into the coarser, smoother Gammatone responses
55 | % via a weighting function. This calculates the time-frequency
56 | % distribution some 30-40x faster than the full approach.
57 |
58 | %% Routines
59 | % The code consists of a main routine, ,
60 | % which takes a waveform and other parameters and returns a
61 | % spectrogram-like time-frequency matrix, and a helper function
62 | % , which constructs the
63 | % weighting matrix to convert FFT output spectra into gammatone
64 | % approximations.
65 |
66 | %% Example usage
67 | % First, we calculate a Gammatone-based spectrogram-like image of
68 | % a speech waveform using the fast approximation. Then we do the
69 | % same thing using the full filtering approach, for comparison.
70 |
71 | % Load a waveform, calculate its gammatone spectrogram, then display:
72 | [d,sr] = wavread('sa2.wav');
73 | tic; [D,F] = gammatonegram(d,sr); toc
74 | %Elapsed time is 0.140742 seconds.
75 | subplot(211)
76 | imagesc(20*log10(D)); axis xy
77 | caxis([-90 -30])
78 | colorbar
79 | % F returns the center frequencies of each band;
80 | % display whichever elements were shown by the autoscaling
81 | set(gca,'YTickLabel',round(F(get(gca,'YTick'))));
82 | ylabel('freq / Hz');
83 | xlabel('time / 10 ms steps');
84 | title('Gammatonegram - fast method')
85 |
86 | % Now repeat with flag to use actual subband filters.
87 | % Since it's the last argument, we have to include all the other
88 | % arguments. These are the default values for: summation window
89 | % (0.025 sec), hop between successive windows (0.010 sec),
90 | % number of gammatone channels (64), lowest frequency (50 Hz),
91 | % and highest frequency (sr/2). The last argument as zero
92 | % means not to use the FFT approach.
93 | tic; [D2,F2] = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc
94 | %Elapsed time is 3.165083 seconds.
95 | subplot(212)
96 | imagesc(20*log10(D2)); axis xy
97 | caxis([-90 -30])
98 | colorbar
99 | set(gca,'YTickLabel',round(F(get(gca,'YTick'))));
100 | ylabel('freq / Hz');
101 | xlabel('time / 10 ms steps');
102 | title('Gammatonegram - accurate method')
103 | % Actual gammatone filters appear somewhat narrower. The fast
104 | % version assumes coherence of addition of amplitude from
105 | % different channels, whereas the actual subband energies will
106 | % depend on how the energy in different frequencies combines.
107 | % Also notice the visible time smearing in the low frequency
108 | % channels that does not occur in the fast version.
109 |
110 | %% Validation
111 | % We can check the frequency responses of the filterbank
112 | % simulated with the fast method against the actual filters
113 | % from Malcolm's toolbox. They match very closely, but of
114 | % course this still doesn't mean the two approaches will give
115 | % identical results - because the fast method ignores the phase
116 | % of each frequency channel when summing up.
117 |
118 | % Check the frequency responses to see that they match:
119 | % Put an impulse through the Slaney ERB filters, then take the
120 | % frequency response of each impulse response.
121 | fcfs = flipud(MakeERBFilters(16000,64,50));
122 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs);
123 | H = zeros(64,512);
124 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end
125 | % The weighting matrix for the FFT is the frequency response
126 | % of each output filter
127 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512);
128 | % Plot every 5th channel from both. Offset by 3 dB just so we can
129 | % see both
130 | fs = [0:511]/512*8000;
131 | figure
132 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r')
133 | axis([0 8000 -150 0])
134 | grid
135 | % Line up pretty well, apart from wiggles below -100 dB
136 | % (from truncating the impulse response at 1000 samples?)
137 |
138 | %% Download
139 | % You can download all the code and data for these examples here:
140 | % .
141 |
142 | %% Referencing
143 | % If you use this work in a publication, I would be grateful
144 | % if you referenced this page as follows:
145 | %
146 | % D. P. W. Ellis (2009). "Gammatone-like spectrograms", web resource.
147 | % http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/
148 |
149 | %% Acknowledgment
150 | % This project was supported in part by the NSF under
151 | % grant IIS-0535168. Any opinions, findings and conclusions
152 | % or recommendations expressed in this material are those of the
153 | % authors and do not necessarily reflect the views of the Sponsors.
154 |
155 | % Last updated: $Date: 2009/07/07 14:14:11 $
156 | % Dan Ellis
157 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/fft2gammatonemx.m:
--------------------------------------------------------------------------------
1 | function [wts,gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen)
2 | % wts = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen)
3 | % Generate a matrix of weights to combine FFT bins into
4 | % Gammatone bins. nfft defines the source FFT size at
5 | % sampling rate sr. Optional nfilts specifies the number of
6 | % output bands required (default 64), and width is the
7 | % constant width of each band in Bark (default 1).
8 | % minfreq, maxfreq specify range covered in Hz (100, sr/2).
9 | % While wts has nfft columns, the second half are all zero.
10 | % Hence, aud spectrum is
11 | % fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft));
12 | % maxlen truncates the rows to this many bins
13 | %
14 | % 2004-09-05 Dan Ellis dpwe@ee.columbia.edu based on rastamat/audspec.m
15 | % Last updated: $Date: 2009/02/22 02:29:25 $
16 |
17 | if nargin < 2; sr = 16000; end
18 | if nargin < 3; nfilts = 64; end
19 | if nargin < 4; width = 1.0; end
20 | if nargin < 5; minfreq = 100; end
21 | if nargin < 6; maxfreq = sr/2; end
22 | if nargin < 7; maxlen = nfft; end
23 |
24 | wts = zeros(nfilts, nfft);
25 |
26 | % after Slaney's MakeERBFilters
27 | EarQ = 9.26449;
28 | minBW = 24.7;
29 | order = 1;
30 |
31 | cfreqs = -(EarQ*minBW) + exp((1:nfilts)'*(-log(maxfreq + EarQ*minBW) + ...
32 | log(minfreq + EarQ*minBW))/nfilts) * (maxfreq + EarQ*minBW);
33 | cfreqs = flipud(cfreqs);
34 |
35 | GTord = 4;
36 |
37 | ucirc = exp(j*2*pi*[0:(nfft/2)]/nfft);
38 |
39 | justpoles = 0;
40 |
41 | for i = 1:nfilts
42 | cf = cfreqs(i);
43 | ERB = width*((cf/EarQ).^order + minBW^order).^(1/order);
44 | B = 1.019*2*pi*ERB;
45 | r = exp(-B/sr);
46 | theta = 2*pi*cf/sr;
47 | pole = r*exp(j*theta);
48 |
49 | if justpoles == 1
50 | % point on unit circle of maximum gain, from differentiating magnitude
51 | cosomegamax = (1+r*r)/(2*r)*cos(theta);
52 | if abs(cosomegamax) > 1
53 | if theta < pi/2; omegamax = 0;
54 | else omegamax = pi; end
55 | else
56 | omegamax = acos(cosomegamax);
57 | end
58 | center = exp(j*omegamax);
59 | gain = abs((pole-center).*(pole'-center)).^GTord;
60 | wts(i,1:(nfft/2+1)) = gain * (abs((pole-ucirc).*(pole'- ...
61 | ucirc)).^-GTord);
62 | else
63 | % poles and zeros, following Malcolm's MakeERBFilter
64 | T = 1/sr;
65 | A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2* ...
66 | cf*pi*T)./exp(B*T))/2;
67 | A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2* ...
68 | cf*pi*T)./exp(B*T))/2;
69 | A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2* ...
70 | cf*pi*T)./exp(B*T))/2;
71 | A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2* ...
72 | cf*pi*T)./exp(B*T))/2;
73 | zros = -[A11 A12 A13 A14]/T;
74 |
75 | gain(i) = abs((-2*exp(4*j*cf*pi*T)*T + ...
76 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
77 | (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ...
78 | sin(2*cf*pi*T))) .* ...
79 | (-2*exp(4*j*cf*pi*T)*T + ...
80 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
81 | (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ...
82 | sin(2*cf*pi*T))).* ...
83 | (-2*exp(4*j*cf*pi*T)*T + ...
84 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
85 | (cos(2*cf*pi*T) - ...
86 | sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ...
87 | (-2*exp(4*j*cf*pi*T)*T + 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ...
88 | (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ...
89 | (-2 ./ exp(2*B*T) - 2*exp(4*j*cf*pi*T) + ...
90 | 2*(1 + exp(4*j*cf*pi*T))./exp(B*T)).^4);
91 | wts(i,1:(nfft/2+1)) = ((T^4)/gain(i)) ...
92 | * abs(ucirc-zros(1)).*abs(ucirc-zros(2))...
93 | .*abs(ucirc-zros(3)).*abs(ucirc-zros(4))...
94 | .*(abs((pole-ucirc).*(pole'-ucirc)).^-GTord);
95 | end
96 | end
97 |
98 | wts = wts(:,1:maxlen);
99 |
100 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/gammatone_demo.m:
--------------------------------------------------------------------------------
1 | %% Gammatone-like spectrograms
2 | % Gammatone filters are a popular linear approximation to the
3 | % filtering performed by the ear. This routine provides a simple
4 | % wrapper for generating time-frequency surfaces based on a
5 | % gammatone analysis, which can be used as a replacement for a
6 | % conventional spectrogram. It also provides a fast approximation
7 | % to this surface based on weighting the output of a conventional
8 | % FFT.
9 |
10 | %% Introduction
11 | % It is very natural to visualize sound as a time-varying
12 | % distribution of energy in frequency - not least because this is
13 | % one way of describing the information our brains get from our
14 | % ears via the auditory nerve. The spectrogram is the traditional
15 | % time-frequency visualization, but it actually has some important
16 | % differences from how sound is analyzed by the ear, most
17 | % significantly that the ear's frequency subbands get wider for
18 | % higher frequencies, whereas the spectrogram has a constant
19 | % bandwidth across all frequency channels.
20 | %
21 | % There have been many signal-processing approximations proposed
22 | % for the frequency analysis performed by the ear; one of the most
23 | % popular is the Gammatone filterbank originally proposed by
24 | % Roy Patterson and colleagues in 1992. Gammatone filters were
25 | % conceived as a simple fit to experimental observations of
26 | % the mammalian cochlea, and have a repeated pole structure leading
27 | % to an impulse response that is the product of a Gamma envelope
28 | % g(t) = t^n e^{-t} and a sinusoid (tone).
29 | %
30 | % One reason for the popularity of this approach is the
31 | % availability of an implementation by Malcolm Slaney, as
32 | % described in:
33 | %
34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2",
35 | % Technical Report #1998-010, Interval Research Corporation, 1998.
36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
37 | %
38 | % Malcolm's toolbox includes routines to design a Gammatone
39 | % filterbank and to process a signal by every filter in a bank,
40 | % but in order to convert this into a time-frequency visualization
41 | % it is necessary to sum up the energy within regular time bins.
42 | % While this is not complicated, the function here provides a
43 | % convenient wrapper to achieve this final step, for applications
44 | % that are content to work with time-frequency magnitude
45 | % distributions instead of going down to the waveform levels. In
46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters
47 | % and ERBFilterBank routines.
48 | %
49 | % This is, however, quite a computationally expensive approach, so
50 | % we also provide an alternative algorithm that gives very similar
51 | % results. In this mode, the Gammatone-based spectrogram is
52 | % constructed by first calculating a conventional, fixed-bandwidth
53 | % spectrogram, then combining the fine frequency resolution of the
54 | % FFT-based spectra into the coarser, smoother Gammatone responses
55 | % via a weighting function. This calculates the time-frequency
56 | % distribution some 30-40x faster than the full approach.
57 |
58 | %% Routines
59 | % The code consists of a main routine, ,
60 | % which takes a waveform and other parameters and returns a
61 | % spectrogram-like time-frequency matrix, and a helper function
62 | % , which constructs the
63 | % weighting matrix to convert FFT output spectra into gammatone
64 | % approximations.
65 |
66 | %% Example usage
67 | % First, we calculate a Gammatone-based spectrogram-like image of
68 | % a speech waveform using the fast approximation. Then we do the
69 | % same thing using the full filtering approach, for comparison.
70 |
71 | % Load a waveform, calculate its gammatone spectrogram, then display:
72 | [d,sr] = wavread('sa2.wav');
73 | tic; D = gammatonegram(d,sr); toc
74 | %Elapsed time is 0.140742 seconds.
75 | subplot(211)
76 | imagesc(20*log10(D)); axis xy
77 | caxis([-90 -30])
78 | colorbar
79 | title('Gammatonegram - fast method')
80 |
81 | % Now repeat with flag to use actual subband filters.
82 | % Since it's the last argument, we have to include all the other
83 | % arguments. These are the default values for: summation window
84 | % (0.025 sec), hop between successive windows (0.010 sec),
85 | % number of gammatone channels (64), lowest frequency (50 Hz),
86 | % and highest frequency (sr/2). The last argument as zero
87 | % means not to use the FFT approach.
88 | tic; D2 = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc
89 | %Elapsed time is 3.165083 seconds.
90 | subplot(212)
91 | imagesc(20*log10(D2)); axis xy
92 | caxis([-90 -30])
93 | colorbar
94 | title('Gammatonegram - accurate method')
95 | % Actual gammatone filters appear somewhat narrower. The fast
96 | % version assumes coherence of addition of amplitude from
97 | % different channels, whereas the actual subband energies will
98 | % depend on how the energy in different frequencies combines.
99 | % Also notice the visible time smearing in the low frequency
100 | % channels that does not occur in the fast version.
101 |
102 | %% Validation
103 | % We can check the frequency responses of the filterbank
104 | % simulated with the fast method against the actual filters
105 | % from Malcolm's toolbox. They match very closely, but of
106 | % course this still doesn't mean the two approaches will give
107 | % identical results - because the fast method ignores the phase
108 | % of each frequency channel when summing up.
109 |
110 | % Check the frequency responses to see that they match:
111 | % Put an impulse through the Slaney ERB filters, then take the
112 | % frequency response of each impulse response.
113 | fcfs = flipud(MakeERBFilters(16000,64,50));
114 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs);
115 | H = zeros(64,512);
116 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end
117 | % The weighting matrix for the FFT is the frequency response
118 | % of each output filter
119 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512);
120 | % Plot every 5th channel from both. Offset by 3 dB just so we can
121 | % see both
122 | fs = [0:511]/512*8000;
123 | figure
124 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r')
125 | axis([0 8000 -150 0])
126 | grid
127 | % Line up pretty well, apart from wiggles below -100 dB
128 | % (from truncating the impulse response at 1000 samples?)
129 |
130 | %% Download
131 | % You can download all the code and data for these examples here:
132 | % .
133 |
134 | %% Referencing
135 | % If you use this work in a publication, I would be grateful
136 | % if you referenced this page as follows:
137 | %
138 | % D. P. W. Ellis (2009). "Gammatone-like spectrograms", web resource, http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/ .
139 |
140 | %% Acknowledgment
141 | % This project was supported in part by the NSF under
142 | % grant IIS-0535168. Any opinions, findings and conclusions
143 | % or recommendations expressed in this material are those of the
144 | % authors and do not necessarily reflect the views of the Sponsors.
145 |
146 | % Last updated: $Date: 2009/02/22 01:46:42 $
147 | % Dan Ellis
148 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/gammatonegram.m:
--------------------------------------------------------------------------------
1 | function [Y,F] = gammatonegram(X,SR,TWIN,THOP,N,FMIN,FMAX,USEFFT,WIDTH)
2 | % [Y,F] = gammatonegram(X,SR,N,TWIN,THOP,FMIN,FMAX,USEFFT,WIDTH)
3 | % Calculate a spectrogram-like time frequency magnitude array
4 | % based on Gammatone subband filters. Waveform X (at sample
5 | % rate SR) is passed through an N (default 64) channel gammatone
6 | % auditory model filterbank, with lowest frequency FMIN (50)
7 | % and highest frequency FMAX (SR/2). The outputs of each band
8 | % then have their energy integrated over windows of TWIN secs
9 | % (0.025), advancing by THOP secs (0.010) for successive
10 | % columns. These magnitudes are returned as an N-row
11 | % nonnegative real matrix, Y.
12 | % If USEFFT is present and zero, revert to actual filtering and
13 | % summing energy within windows.
14 | % WIDTH (default 1.0) is how to scale bandwidth of filters
15 | % relative to ERB default (for fast method only).
16 | % F returns the center frequencies in Hz of each row of Y
17 | % (uniformly spaced on a Bark scale).
18 | %
19 | % 2009-02-18 DAn Ellis dpwe@ee.columbia.edu
20 | % Last updated: $Date: 2009/02/23 21:07:09 $
21 |
22 | if nargin < 2; SR = 16000; end
23 | if nargin < 3; TWIN = 0.025; end
24 | if nargin < 4; THOP = 0.010; end
25 | if nargin < 5; N = 64; end
26 | if nargin < 6; FMIN = 50; end
27 | if nargin < 7; FMAX = SR/2; end
28 | if nargin < 8; USEFFT = 1; end
29 | if nargin < 9; WIDTH = 1.0; end
30 |
31 |
32 | if USEFFT == 0
33 |
34 | % Use malcolm's function to filter into subbands
35 | %%%% IGNORES FMAX! *****
36 | [fcoefs,F] = MakeERBFilters(SR, N, FMIN);
37 | fcoefs = flipud(fcoefs);
38 |
39 | XF = ERBFilterBank(X,fcoefs);
40 |
41 | nwin = round(TWIN*SR);
42 | % Always use rectangular window for now
43 | % if USEHANN == 1
44 | window = hann(nwin)';
45 | % else
46 | % window = ones(1,nwin);
47 | % end
48 | % window = window/sum(window);
49 | % XE = [zeros(N,round(nwin/2)),XF.^2,zeros(N,round(nwin/2))];
50 | XE = [XF.^2];
51 |
52 | hopsamps = round(THOP*SR);
53 |
54 | ncols = 1 + floor((size(XE,2)-nwin)/hopsamps);
55 |
56 | Y = zeros(N,ncols);
57 |
58 | % winmx = repmat(window,N,1);
59 |
60 | for i = 1:ncols
61 | % Y(:,i) = sqrt(sum(winmx.*XE(:,(i-1)*hopsamps + [1:nwin]),2));
62 | Y(:,i) = sqrt(mean(XE(:,(i-1)*hopsamps + [1:nwin]),2));
63 | end
64 |
65 | else
66 | % USEFFT version
67 | % How long a window to use relative to the integration window requested
68 | winext = 1;
69 | twinmod = winext * TWIN;
70 | % first spectrogram
71 | nfft = 2^(ceil(log(2*twinmod*SR)/log(2)));
72 | nhop = round(THOP*SR);
73 | nwin = round(twinmod*SR);
74 | [gtm,F] = fft2gammatonemx(nfft, SR, N, WIDTH, FMIN, FMAX, nfft/2+1);
75 | % perform FFT and weighting in amplitude domain
76 | Y = 1/nfft*gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop));
77 | % or the power domain? doesn't match nearly as well
78 | %Y = 1/nfft*sqrt(gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop).^2));
79 | end
80 |
81 |
82 |
83 |
--------------------------------------------------------------------------------
/gammatone/auditory_toolkit/specgram.m:
--------------------------------------------------------------------------------
1 | function y = specgram(x,n,sr,w,ov)
2 | % Y = myspecgram(X,NFFT,SR,W,OV)
3 | % Substitute for Matlab's specgram, calculates & displays spectrogram
4 | % $Header: /homes/dpwe/tmp/e6820/RCS/myspecgram.m,v 1.1 2002/08/04 19:20:27 dpwe Exp $
5 |
6 | if (size(x,1) > size(x,2))
7 | x = x';
8 | end
9 |
10 | s = length(x);
11 |
12 | if nargin < 2
13 | n = 256;
14 | end
15 | if nargin < 3
16 | sr = 1;
17 | end
18 | if nargin < 4
19 | w = n;
20 | end
21 | if nargin < 5
22 | ov = w/2;
23 | end
24 | h = w - ov;
25 |
26 | halflen = w/2;
27 | halff = n/2; % midpoint of win
28 | acthalflen = min(halff, halflen);
29 |
30 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
31 | win = zeros(1, n);
32 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
33 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
34 |
35 | c = 1;
36 |
37 | % pre-allocate output array
38 | ncols = 1+fix((s-n)/h);
39 | d = zeros((1+n/2), ncols);
40 |
41 | for b = 0:h:(s-n)
42 | u = win.*x((b+1):(b+n));
43 | t = fft(u);
44 | d(:,c) = t([1:(1+n/2)]');
45 | c = c+1;
46 | end;
47 |
48 | tt = [0:h:(s-n)]/sr;
49 | ff = [0:(n/2)]*sr/n;
50 |
51 | if nargout < 1
52 | imagesc(tt,ff,20*log10(abs(d)));
53 | axis xy
54 | xlabel('Time / s');
55 | ylabel('Frequency / Hz');
56 | else
57 | y = d;
58 | end
59 |
--------------------------------------------------------------------------------
/gammatone/doc/FurElise.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/doc/FurElise.png
--------------------------------------------------------------------------------
/gammatone/doc/Makefile:
--------------------------------------------------------------------------------
1 | # Makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line.
5 | SPHINXOPTS =
6 | SPHINXBUILD = sphinx-build
7 | PAPER =
8 | BUILDDIR = _build
9 |
10 | # Internal variables.
11 | PAPEROPT_a4 = -D latex_paper_size=a4
12 | PAPEROPT_letter = -D latex_paper_size=letter
13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
14 | # the i18n builder cannot share the environment and doctrees with the others
15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
16 |
17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
18 |
19 | help:
20 | @echo "Please use \`make ' where is one of"
21 | @echo " html to make standalone HTML files"
22 | @echo " dirhtml to make HTML files named index.html in directories"
23 | @echo " singlehtml to make a single large HTML file"
24 | @echo " pickle to make pickle files"
25 | @echo " json to make JSON files"
26 | @echo " htmlhelp to make HTML files and a HTML help project"
27 | @echo " qthelp to make HTML files and a qthelp project"
28 | @echo " devhelp to make HTML files and a Devhelp project"
29 | @echo " epub to make an epub"
30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
31 | @echo " latexpdf to make LaTeX files and run them through pdflatex"
32 | @echo " text to make text files"
33 | @echo " man to make manual pages"
34 | @echo " texinfo to make Texinfo files"
35 | @echo " info to make Texinfo files and run them through makeinfo"
36 | @echo " gettext to make PO message catalogs"
37 | @echo " changes to make an overview of all changed/added/deprecated items"
38 | @echo " linkcheck to check all external links for integrity"
39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)"
40 |
41 | clean:
42 | -rm -rf $(BUILDDIR)/*
43 |
44 | html:
45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
46 | @echo
47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
48 |
49 | dirhtml:
50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
51 | @echo
52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
53 |
54 | singlehtml:
55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
56 | @echo
57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
58 |
59 | pickle:
60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
61 | @echo
62 | @echo "Build finished; now you can process the pickle files."
63 |
64 | json:
65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
66 | @echo
67 | @echo "Build finished; now you can process the JSON files."
68 |
69 | htmlhelp:
70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
71 | @echo
72 | @echo "Build finished; now you can run HTML Help Workshop with the" \
73 | ".hhp project file in $(BUILDDIR)/htmlhelp."
74 |
75 | qthelp:
76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
77 | @echo
78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \
79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/gammatone.qhcp"
81 | @echo "To view the help file:"
82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/gammatone.qhc"
83 |
84 | devhelp:
85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
86 | @echo
87 | @echo "Build finished."
88 | @echo "To view the help file:"
89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/gammatone"
90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/gammatone"
91 | @echo "# devhelp"
92 |
93 | epub:
94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
95 | @echo
96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
97 |
98 | latex:
99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100 | @echo
101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \
103 | "(use \`make latexpdf' here to do that automatically)."
104 |
105 | latexpdf:
106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 | @echo "Running LaTeX files through pdflatex..."
108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf
109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110 |
111 | text:
112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113 | @echo
114 | @echo "Build finished. The text files are in $(BUILDDIR)/text."
115 |
116 | man:
117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118 | @echo
119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120 |
121 | texinfo:
122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123 | @echo
124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125 | @echo "Run \`make' in that directory to run these through makeinfo" \
126 | "(use \`make info' here to do that automatically)."
127 |
128 | info:
129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130 | @echo "Running Texinfo files through makeinfo..."
131 | make -C $(BUILDDIR)/texinfo info
132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133 |
134 | gettext:
135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136 | @echo
137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138 |
139 | changes:
140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141 | @echo
142 | @echo "The overview file is in $(BUILDDIR)/changes."
143 |
144 | linkcheck:
145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146 | @echo
147 | @echo "Link check complete; look for any errors in the above output " \
148 | "or in $(BUILDDIR)/linkcheck/output.txt."
149 |
150 | doctest:
151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152 | @echo "Testing of doctests in the sources finished, look at the " \
153 | "results in $(BUILDDIR)/doctest/output.txt."
154 |
--------------------------------------------------------------------------------
/gammatone/doc/conf.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # gammatone documentation build configuration file, created by
4 | # sphinx-quickstart on Sat Dec 8 23:21:49 2012.
5 | #
6 | # This file is execfile()d with the current directory set to its containing dir.
7 | #
8 | # Note that not all possible configuration values are present in this
9 | # autogenerated file.
10 | #
11 | # All configuration values have a default; values that are commented out
12 | # serve to show the default.
13 |
14 | import sys, os
15 |
16 | # If extensions (or modules to document with autodoc) are in another directory,
17 | # add these directories to sys.path here. If the directory is relative to the
18 | # documentation root, use os.path.abspath to make it absolute, like shown here.
19 | #sys.path.insert(0, os.path.abspath('.'))
20 |
21 | # -- General configuration -----------------------------------------------------
22 |
23 | # If your documentation needs a minimal Sphinx version, state it here.
24 | #needs_sphinx = '1.0'
25 |
26 | # Add any Sphinx extension module names here, as strings. They can be extensions
27 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
28 | extensions = ['sphinx.ext.autodoc']
29 |
30 | # Add any paths that contain templates here, relative to this directory.
31 | templates_path = ['_templates']
32 |
33 | # The suffix of source filenames.
34 | source_suffix = '.rst'
35 |
36 | # The encoding of source files.
37 | #source_encoding = 'utf-8-sig'
38 |
39 | # The master toctree document.
40 | master_doc = 'index'
41 |
42 | # General information about the project.
43 | project = u'Gammatone Filterbank Toolkit'
44 | copyright = u'2014, Jason Heeris'
45 |
46 | # The version info for the project you're documenting, acts as replacement for
47 | # |version| and |release|, also used in various other places throughout the
48 | # built documents.
49 | #
50 | # The short X.Y version.
51 | version = '1.0'
52 | # The full version, including alpha/beta/rc tags.
53 | release = '1.0'
54 |
55 | # The language for content autogenerated by Sphinx. Refer to documentation
56 | # for a list of supported languages.
57 | #language = None
58 |
59 | # There are two options for replacing |today|: either, you set today to some
60 | # non-false value, then it is used:
61 | #today = ''
62 | # Else, today_fmt is used as the format for a strftime call.
63 | #today_fmt = '%B %d, %Y'
64 |
65 | # List of patterns, relative to source directory, that match files and
66 | # directories to ignore when looking for source files.
67 | exclude_patterns = ['_build']
68 |
69 | # The reST default role (used for this markup: `text`) to use for all documents.
70 | #default_role = None
71 |
72 | # If true, '()' will be appended to :func: etc. cross-reference text.
73 | #add_function_parentheses = True
74 |
75 | # If true, the current module name will be prepended to all description
76 | # unit titles (such as .. function::).
77 | #add_module_names = True
78 |
79 | # If true, sectionauthor and moduleauthor directives will be shown in the
80 | # output. They are ignored by default.
81 | #show_authors = False
82 |
83 | # The name of the Pygments (syntax highlighting) style to use.
84 | pygments_style = 'sphinx'
85 |
86 | # A list of ignored prefixes for module index sorting.
87 | #modindex_common_prefix = []
88 |
89 |
90 | # -- Options for HTML output ---------------------------------------------------
91 |
92 | # The theme to use for HTML and HTML Help pages. See the documentation for
93 | # a list of builtin themes.
94 | html_theme = 'haiku'
95 |
96 | # Theme options are theme-specific and customize the look and feel of a theme
97 | # further. For a list of options available for each theme, see the
98 | # documentation.
99 | #html_theme_options = {}
100 |
101 | # Add any paths that contain custom themes here, relative to this directory.
102 | #html_theme_path = []
103 |
104 | # The name for this set of Sphinx documents. If None, it defaults to
105 | # " v documentation".
106 | html_title = u"%s %s" % (project, release)
107 |
108 | # A shorter title for the navigation bar. Default is the same as html_title.
109 | #html_short_title = None
110 |
111 | # The name of an image file (relative to this directory) to place at the top
112 | # of the sidebar.
113 | #html_logo = None
114 |
115 | # The name of an image file (within the static path) to use as favicon of the
116 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
117 | # pixels large.
118 | #html_favicon = None
119 |
120 | # Add any paths that contain custom static files (such as style sheets) here,
121 | # relative to this directory. They are copied after the builtin static files,
122 | # so a file named "default.css" will overwrite the builtin "default.css".
123 | html_static_path = ['_static']
124 |
125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
126 | # using the given strftime format.
127 | #html_last_updated_fmt = '%b %d, %Y'
128 |
129 | # If true, SmartyPants will be used to convert quotes and dashes to
130 | # typographically correct entities.
131 | html_use_smartypants = True
132 |
133 | # Custom sidebar templates, maps document names to template names.
134 | html_sidebars = {
135 | '**' : [
136 | 'localtoc.html',
137 | 'globaltoc.html',
138 | 'relations.html',
139 | 'searchbox.html'
140 | ],
141 | }
142 |
143 | # Additional templates that should be rendered to pages, maps page names to
144 | # template names.
145 | #html_additional_pages = {}
146 |
147 | # If false, no module index is generated.
148 | #html_domain_indices = True
149 |
150 | # If false, no index is generated.
151 | #html_use_index = True
152 |
153 | # If true, the index is split into individual pages for each letter.
154 | #html_split_index = False
155 |
156 | # If true, links to the reST sources are added to the pages.
157 | html_show_sourcelink = False
158 |
159 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
160 | #html_show_sphinx = True
161 |
162 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
163 | #html_show_copyright = True
164 |
165 | # If true, an OpenSearch description file will be output, and all pages will
166 | # contain a tag referring to it. The value of this option must be the
167 | # base URL from which the finished HTML is served.
168 | #html_use_opensearch = ''
169 |
170 | # This is the file name suffix for HTML files (e.g. ".xhtml").
171 | #html_file_suffix = None
172 |
173 | # Output file base name for HTML help builder.
174 | htmlhelp_basename = 'gammatonedoc'
175 |
176 |
177 | # -- Options for LaTeX output --------------------------------------------------
178 |
179 | latex_elements = {
180 | # The paper size ('letterpaper' or 'a4paper').
181 | #'papersize': 'letterpaper',
182 |
183 | # The font size ('10pt', '11pt' or '12pt').
184 | #'pointsize': '10pt',
185 |
186 | # Additional stuff for the LaTeX preamble.
187 | #'preamble': '',
188 | }
189 |
190 | # Grouping the document tree into LaTeX files. List of tuples
191 | # (source start file, target name, title, author, documentclass [howto/manual]).
192 | latex_documents = [
193 | ('index', 'gammatone.tex', u'Gammatone Documentation',
194 | u'Jason Heeris', 'manual'),
195 | ]
196 |
197 | # The name of an image file (relative to this directory) to place at the top of
198 | # the title page.
199 | #latex_logo = None
200 |
201 | # For "manual" documents, if this is true, then toplevel headings are parts,
202 | # not chapters.
203 | #latex_use_parts = False
204 |
205 | # If true, show page references after internal links.
206 | #latex_show_pagerefs = False
207 |
208 | # If true, show URL addresses after external links.
209 | #latex_show_urls = False
210 |
211 | # Documents to append as an appendix to all manuals.
212 | #latex_appendices = []
213 |
214 | # If false, no module index is generated.
215 | #latex_domain_indices = True
216 |
217 |
218 | # -- Options for manual page output --------------------------------------------
219 |
220 | # One entry per manual page. List of tuples
221 | # (source start file, name, description, authors, manual section).
222 | man_pages = [
223 | ('index', 'gammatone', u'Gammatone Documentation',
224 | [u'Jason Heeris'], 1)
225 | ]
226 |
227 | # If true, show URL addresses after external links.
228 | #man_show_urls = False
229 |
230 |
231 | # -- Options for Texinfo output ------------------------------------------------
232 |
233 | # Grouping the document tree into Texinfo files. List of tuples
234 | # (source start file, target name, title, author,
235 | # dir menu entry, description, category)
236 | texinfo_documents = [
237 | ('index', 'gammatone', u'Gammatone Documentation',
238 | u'Jason Heeris', 'gammatone', 'Gammatone filterbank construction tools.',
239 | 'Miscellaneous'),
240 | ]
241 |
242 | # Documents to append as an appendix to all manuals.
243 | #texinfo_appendices = []
244 |
245 | # If false, no module index is generated.
246 | #texinfo_domain_indices = True
247 |
248 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
249 | #texinfo_show_urls = 'footnote'
250 |
251 | # -- Autodoc configuration -----------------------------------------------------
252 |
253 | # autodoc_default_flags = ['members']
254 |
--------------------------------------------------------------------------------
/gammatone/doc/details.rst:
--------------------------------------------------------------------------------
1 | About the Gammatone Filterbank Toolkit
2 | --------------------------------------
3 |
4 | Summary
5 | ~~~~~~~
6 |
7 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank
8 | MATLAB code, detailed below, to Python 2 and 3 using Numpy and Scipy. It
9 | analyses signals by running them through banks of gammatone filters,
10 | similar to Fourier-based spectrogram analysis.
11 |
12 | .. figure:: FurElise.png
13 | :align: center
14 | :alt: Gammatone-based spectrogram of Für Elise
15 |
16 | Gammatone-based spectrogram of Für Elise
17 |
18 | Dependencies
19 | ~~~~~~~~~~~~
20 |
21 | - numpy
22 | - scipy
23 | - nose
24 | - mock
25 | - matplotlib
26 |
27 | Using the Code
28 | ~~~~~~~~~~~~~~
29 |
30 | For a demonstration, find a `.wav` file (for example,
31 | `Für Elise `_) and run::
32 |
33 | python -m gammatone FurElise.wav -d 10
34 |
35 | ...to see a gammatone-gram of the first ten seconds of Beethoven's "Für
36 | Elise." If you've installed via
37 | ``pip`` or ``setup.py install``, you should also be able to just run::
38 |
39 | gammatone FurElise.wav -d 10
40 |
41 | Basis
42 | ~~~~~
43 |
44 | This project is based on research into how humans perceive audio,
45 | originally published by Malcolm Slaney:
46 |
47 | `Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report
48 | #1998-010, Interval Research Corporation,
49 | 1998. `_
50 |
51 | Slaney's report describes a way of modelling how the human ear
52 | perceives, emphasises and separates different frequencies of sound. A
53 | series of gammatone filters are constructed whose width increases with
54 | increasing centre frequency, and this bank of filters is applied to a
55 | time-domain signal. The result of this is a spectrum that should
56 | represent the human experience of sound better than, say, a
57 | Fourier-domain spectrum would.
58 |
59 | A gammatone filter has an impulse response that is a sine wave
60 | multiplied by a gamma distribution function. It is a common approach to
61 | modelling the auditory system.
62 |
63 | The gammatone filterbank approach can be considered analogous (but not
64 | equivalent) to a discrete Fourier transform where the frequency axis is
65 | logarithmic. For example, a series of notes spaced an octave apart would
66 | appear to be roughly linearly spaced; or a sound that was distributed
67 | across the same linear frequency range would appear to have more spread
68 | at lower frequencies.
69 |
70 | The real goal of this toolkit is to allow easy computation of the
71 | gammatone equivalent of a spectrogram — a time-varying spectrum of
72 | energy over audible frequencies based on a gammatone filterbank.
73 |
74 | Slaney demonstrated his research with an initial implementation in
75 | MATLAB. This implementation was later extended by Dan Ellis, who found a
76 | way to approximate a "gammatone-gram" by using the fast Fourier
77 | transform. Ellis' code calculates a matrix of weights that can be
78 | applied to the output of a FFT so that a Fourier-based spectrogram can
79 | easily be transformed into such an approximation.
80 |
81 | Ellis' code and documentation is here: `Gammatone-like
82 | spectrograms `_
83 |
84 | Interest
85 | ~~~~~~~~
86 |
87 | I became interested in this because of my background in science
88 | communication and my general interest in the teaching of signal
89 | processing. I find that the spectrogram approach to visualising signals
90 | is adequate for illustrating abstract systems or the mathematical
91 | properties of transforms, but bears little correspondence to a person's
92 | own experience of sound. If someone wants to see what their favourite
93 | piece of music "looks like," a normal Fourier transform based
94 | spectrogram is actually quite a poor way to visualise it. Features of
95 | the audio seem to be oddly spaced or unnaturally emphasised or
96 | de-emphasised depending on where they are in the frequency domain.
97 |
98 | The gammatone filterbank approach seems to be closer to what someone
99 | might intuitively expect a visualisation of sound to look like, and can
100 | help develop an intuition about alternative representations of signals.
101 |
102 | Verifying the port
103 | ~~~~~~~~~~~~~~~~~~
104 |
105 | Since this is a port of existing MATLAB code, I've written tests to
106 | verify the Python implementation against the original code. These tests
107 | aren't unit tests, but they do generally test single functions. Running
108 | the tests has the same workflow:
109 |
110 | 1. Run the scripts in the ``test_generation`` directory. This will
111 | create a ``.mat`` file containing test data in ``tests/data``.
112 |
113 | 2. Run ``nosetest3`` in the top level directory. This will find and run
114 | all the tests in the ``tests`` directory.
115 |
116 | Although I'm usually loathe to check in generated files to version
117 | control, I'm willing to make an exception for the ``.mat`` files
118 | containing the test data. My reasoning is that they represent the
119 | decoupling of my code from the MATLAB code, and if the two projects were
120 | separated, they would be considered a part of the Python code, not the
121 | original MATLAB code.
122 |
--------------------------------------------------------------------------------
/gammatone/doc/fftweight.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.fftweight` -- FFT weightings for spectrogram-like gammatone analysis
2 | ====================================================================================
3 |
4 | .. automodule:: gammatone.fftweight
5 | :members:
6 |
--------------------------------------------------------------------------------
/gammatone/doc/filters.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.filters` -- gammatone filterbank construction
2 | =============================================================
3 |
4 | .. automodule:: gammatone.filters
5 | :members:
6 |
--------------------------------------------------------------------------------
/gammatone/doc/gtgram.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.gtgram` -- spectrogram-like gammatone analysis
2 | ==============================================================
3 |
4 | .. automodule:: gammatone.gtgram
5 | :members:
6 |
--------------------------------------------------------------------------------
/gammatone/doc/index.rst:
--------------------------------------------------------------------------------
1 | .. gammatone documentation master file, created by
2 | sphinx-quickstart on Sat Dec 8 23:21:49 2012.
3 |
4 | Index
5 | =====
6 |
7 | Modules
8 | -------
9 |
10 | .. toctree::
11 | :maxdepth: 2
12 |
13 | filters
14 | gtgram
15 | fftweight
16 | plot
17 |
18 | .. include:: details.rst
19 |
20 | Indices and tables
21 | ------------------
22 |
23 | * :ref:`genindex`
24 | * :ref:`modindex`
25 | * :ref:`search`
26 |
27 |
--------------------------------------------------------------------------------
/gammatone/doc/make.bat:
--------------------------------------------------------------------------------
1 | @ECHO OFF
2 |
3 | REM Command file for Sphinx documentation
4 |
5 | if "%SPHINXBUILD%" == "" (
6 | set SPHINXBUILD=sphinx-build
7 | )
8 | set BUILDDIR=_build
9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
10 | set I18NSPHINXOPTS=%SPHINXOPTS% .
11 | if NOT "%PAPER%" == "" (
12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
14 | )
15 |
16 | if "%1" == "" goto help
17 |
18 | if "%1" == "help" (
19 | :help
20 | echo.Please use `make ^` where ^ is one of
21 | echo. html to make standalone HTML files
22 | echo. dirhtml to make HTML files named index.html in directories
23 | echo. singlehtml to make a single large HTML file
24 | echo. pickle to make pickle files
25 | echo. json to make JSON files
26 | echo. htmlhelp to make HTML files and a HTML help project
27 | echo. qthelp to make HTML files and a qthelp project
28 | echo. devhelp to make HTML files and a Devhelp project
29 | echo. epub to make an epub
30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
31 | echo. text to make text files
32 | echo. man to make manual pages
33 | echo. texinfo to make Texinfo files
34 | echo. gettext to make PO message catalogs
35 | echo. changes to make an overview over all changed/added/deprecated items
36 | echo. linkcheck to check all external links for integrity
37 | echo. doctest to run all doctests embedded in the documentation if enabled
38 | goto end
39 | )
40 |
41 | if "%1" == "clean" (
42 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
43 | del /q /s %BUILDDIR%\*
44 | goto end
45 | )
46 |
47 | if "%1" == "html" (
48 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
49 | if errorlevel 1 exit /b 1
50 | echo.
51 | echo.Build finished. The HTML pages are in %BUILDDIR%/html.
52 | goto end
53 | )
54 |
55 | if "%1" == "dirhtml" (
56 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
57 | if errorlevel 1 exit /b 1
58 | echo.
59 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
60 | goto end
61 | )
62 |
63 | if "%1" == "singlehtml" (
64 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
65 | if errorlevel 1 exit /b 1
66 | echo.
67 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
68 | goto end
69 | )
70 |
71 | if "%1" == "pickle" (
72 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
73 | if errorlevel 1 exit /b 1
74 | echo.
75 | echo.Build finished; now you can process the pickle files.
76 | goto end
77 | )
78 |
79 | if "%1" == "json" (
80 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
81 | if errorlevel 1 exit /b 1
82 | echo.
83 | echo.Build finished; now you can process the JSON files.
84 | goto end
85 | )
86 |
87 | if "%1" == "htmlhelp" (
88 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
89 | if errorlevel 1 exit /b 1
90 | echo.
91 | echo.Build finished; now you can run HTML Help Workshop with the ^
92 | .hhp project file in %BUILDDIR%/htmlhelp.
93 | goto end
94 | )
95 |
96 | if "%1" == "qthelp" (
97 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
98 | if errorlevel 1 exit /b 1
99 | echo.
100 | echo.Build finished; now you can run "qcollectiongenerator" with the ^
101 | .qhcp project file in %BUILDDIR%/qthelp, like this:
102 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\gammatone.qhcp
103 | echo.To view the help file:
104 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\gammatone.ghc
105 | goto end
106 | )
107 |
108 | if "%1" == "devhelp" (
109 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
110 | if errorlevel 1 exit /b 1
111 | echo.
112 | echo.Build finished.
113 | goto end
114 | )
115 |
116 | if "%1" == "epub" (
117 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
118 | if errorlevel 1 exit /b 1
119 | echo.
120 | echo.Build finished. The epub file is in %BUILDDIR%/epub.
121 | goto end
122 | )
123 |
124 | if "%1" == "latex" (
125 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
126 | if errorlevel 1 exit /b 1
127 | echo.
128 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
129 | goto end
130 | )
131 |
132 | if "%1" == "text" (
133 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
134 | if errorlevel 1 exit /b 1
135 | echo.
136 | echo.Build finished. The text files are in %BUILDDIR%/text.
137 | goto end
138 | )
139 |
140 | if "%1" == "man" (
141 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
142 | if errorlevel 1 exit /b 1
143 | echo.
144 | echo.Build finished. The manual pages are in %BUILDDIR%/man.
145 | goto end
146 | )
147 |
148 | if "%1" == "texinfo" (
149 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
150 | if errorlevel 1 exit /b 1
151 | echo.
152 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
153 | goto end
154 | )
155 |
156 | if "%1" == "gettext" (
157 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
158 | if errorlevel 1 exit /b 1
159 | echo.
160 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
161 | goto end
162 | )
163 |
164 | if "%1" == "changes" (
165 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
166 | if errorlevel 1 exit /b 1
167 | echo.
168 | echo.The overview file is in %BUILDDIR%/changes.
169 | goto end
170 | )
171 |
172 | if "%1" == "linkcheck" (
173 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
174 | if errorlevel 1 exit /b 1
175 | echo.
176 | echo.Link check complete; look for any errors in the above output ^
177 | or in %BUILDDIR%/linkcheck/output.txt.
178 | goto end
179 | )
180 |
181 | if "%1" == "doctest" (
182 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
183 | if errorlevel 1 exit /b 1
184 | echo.
185 | echo.Testing of doctests in the sources finished, look at the ^
186 | results in %BUILDDIR%/doctest/output.txt.
187 | goto end
188 | )
189 |
190 | :end
191 |
--------------------------------------------------------------------------------
/gammatone/doc/plot.rst:
--------------------------------------------------------------------------------
1 | :mod:`gammatone.plot` -- Plotting utilities for gammatone analysis
2 | ==================================================================
3 |
4 | .. automodule:: gammatone.plot
5 | :members:
6 |
--------------------------------------------------------------------------------
/gammatone/gammatone/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 |
6 | # Designate gammatone module
7 | """
8 | Gammatone filterbank toolkit
9 | """
10 |
--------------------------------------------------------------------------------
/gammatone/gammatone/__main__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | from gammatone.plot import main
6 | main()
7 |
--------------------------------------------------------------------------------
/gammatone/gammatone/fftweight.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | """
6 | This module contains functions for calculating weights to approximate a
7 | gammatone filterbank-like "spectrogram" from a Fourier transform.
8 | """
9 | from __future__ import division
10 | import numpy as np
11 |
12 | import gammatone.filters as filters
13 | import gammatone.gtgram as gtgram
14 |
15 | def specgram_window(
16 | nfft,
17 | nwin,
18 | ):
19 | """
20 | Window calculation used in specgram replacement function. Hann window of
21 | width `nwin` centred in an array of width `nfft`.
22 | """
23 | halflen = nwin // 2
24 | halff = nfft // 2 # midpoint of win
25 | acthalflen = int(np.floor(min(halff, halflen)))
26 | halfwin = 0.5 * ( 1 + np.cos(np.pi * np.arange(0, halflen+1)/halflen))
27 | win = np.zeros((nfft,))
28 | win[halff:halff+acthalflen] = halfwin[0:acthalflen];
29 | win[halff:halff-acthalflen:-1] = halfwin[0:acthalflen];
30 | return win
31 |
32 |
33 | def specgram(x, n, sr, w, h):
34 | """ Substitute for Matlab's specgram, calculates a simple spectrogram.
35 |
36 | :param x: The signal to analyse
37 | :param n: The FFT length
38 | :param sr: The sampling rate
39 | :param w: The window length (see :func:`specgram_window`)
40 | :param h: The hop size (must be greater than zero)
41 | """
42 | # Based on Dan Ellis' myspecgram.m,v 1.1 2002/08/04
43 | assert h > 0, "Must have a hop size greater than 0"
44 |
45 | s = x.shape[0]
46 | win = specgram_window(n, w)
47 |
48 | c = 0
49 |
50 | # pre-allocate output array
51 | ncols = 1 + int(np.floor((s - n)/h))
52 | d = np.zeros(((1 + n // 2), ncols), np.dtype(complex))
53 |
54 | for b in range(0, s - n, h):
55 | u = win * x[b : b + n]
56 | t = np.fft.fft(u)
57 | d[:, c] = t[0 : (1 + n // 2)].T
58 | c = c + 1
59 |
60 | return d
61 |
62 |
63 | def fft_weights(
64 | nfft,
65 | fs,
66 | nfilts,
67 | width,
68 | fmin,
69 | fmax,
70 | maxlen):
71 | """
72 | :param nfft: the source FFT size
73 | :param sr: sampling rate (Hz)
74 | :param nfilts: the number of output bands required (default 64)
75 | :param width: the constant width of each band in Bark (default 1)
76 | :param fmin: lower limit of frequencies (Hz)
77 | :param fmax: upper limit of frequencies (Hz)
78 | :param maxlen: number of bins to truncate the rows to
79 |
80 | :return: a tuple `weights`, `gain` with the calculated weight matrices and
81 | gain vectors
82 |
83 | Generate a matrix of weights to combine FFT bins into Gammatone bins.
84 |
85 | Note about `maxlen` parameter: While wts has nfft columns, the second half
86 | are all zero. Hence, aud spectrum is::
87 |
88 | fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft))
89 |
90 | `maxlen` truncates the rows to this many bins.
91 |
92 | | (c) 2004-2009 Dan Ellis dpwe@ee.columbia.edu based on rastamat/audspec.m
93 | | (c) 2012 Jason Heeris (Python implementation)
94 | """
95 | ucirc = np.exp(1j * 2 * np.pi * np.arange(0, nfft / 2 + 1) / nfft)[None, ...]
96 |
97 | # Common ERB filter code factored out
98 | cf_array = filters.erb_space(fmin, fmax, nfilts)[::-1]
99 |
100 | _, A11, A12, A13, A14, _, _, _, B2, gain = (
101 | filters.make_erb_filters(fs, cf_array, width).T
102 | )
103 |
104 | A11, A12, A13, A14 = A11[..., None], A12[..., None], A13[..., None], A14[..., None]
105 |
106 | r = np.sqrt(B2)
107 | theta = 2 * np.pi * cf_array / fs
108 | pole = (r * np.exp(1j * theta))[..., None]
109 |
110 | GTord = 4
111 |
112 | weights = np.zeros((nfilts, nfft))
113 |
114 | weights[:, 0:ucirc.shape[1]] = (
115 | np.abs(ucirc + A11 * fs) * np.abs(ucirc + A12 * fs)
116 | * np.abs(ucirc + A13 * fs) * np.abs(ucirc + A14 * fs)
117 | * np.abs(fs * (pole - ucirc) * (pole.conj() - ucirc)) ** (-GTord)
118 | / gain[..., None]
119 | )
120 |
121 | weights = weights[:, 0:int(maxlen)]
122 |
123 | return weights, gain
124 |
125 |
126 | def fft_gtgram(
127 | wave,
128 | fs,
129 | window_time, hop_time,
130 | channels,
131 | f_min):
132 | """
133 | Calculate a spectrogram-like time frequency magnitude array based on
134 | an FFT-based approximation to gammatone subband filters.
135 |
136 | A matrix of weightings is calculated (using :func:`gtgram.fft_weights`), and
137 | applied to the FFT of the input signal (``wave``, using sample rate ``fs``).
138 | The result is an approximation of full filtering using an ERB gammatone
139 | filterbank (as per :func:`gtgram.gtgram`).
140 |
141 | ``f_min`` determines the frequency cutoff for the corresponding gammatone
142 | filterbank. ``window_time`` and ``hop_time`` (both in seconds) are the size
143 | and overlap of the spectrogram columns.
144 |
145 | | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu
146 | |
147 | | (c) 2013 Jason Heeris (Python implementation)
148 | """
149 | width = 1 # Was a parameter in the MATLAB code
150 |
151 | nfft = int(2 ** (np.ceil(np.log2(2 * window_time * fs))))
152 | nwin, nhop, _ = gtgram.gtgram_strides(fs, window_time, hop_time, 0);
153 |
154 | gt_weights, _ = fft_weights(
155 | nfft,
156 | fs,
157 | channels,
158 | width,
159 | f_min,
160 | fs / 2,
161 | nfft / 2 + 1
162 | )
163 |
164 | sgram = specgram(wave, nfft, fs, nwin, nhop)
165 |
166 | result = gt_weights.dot(np.abs(sgram)) / nfft
167 |
168 | return result
169 |
--------------------------------------------------------------------------------
/gammatone/gammatone/filters.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | """
6 | This module contains functions for constructing sets of equivalent rectangular
7 | bandwidth gammatone filters.
8 | """
9 | from __future__ import division
10 | from collections import namedtuple
11 |
12 | import numpy as np
13 | import scipy as sp
14 | from scipy import signal as sgn
15 |
16 | DEFAULT_FILTER_NUM = 100
17 | DEFAULT_LOW_FREQ = 100
18 | DEFAULT_HIGH_FREQ = 44100 / 4
19 |
20 |
21 | def erb_point(low_freq, high_freq, fraction):
22 | """
23 | Calculates a single point on an ERB scale between ``low_freq`` and
24 | ``high_freq``, determined by ``fraction``. When ``fraction`` is ``1``,
25 | ``low_freq`` will be returned. When ``fraction`` is ``0``, ``high_freq``
26 | will be returned.
27 |
28 | ``fraction`` can actually be outside the range ``[0, 1]``, which in general
29 | isn't very meaningful, but might be useful when ``fraction`` is rounded a
30 | little above or below ``[0, 1]`` (eg. for plot axis labels).
31 | """
32 | # Change the following three parameters if you wish to use a different ERB
33 | # scale. Must change in MakeERBCoeffs too.
34 | # TODO: Factor these parameters out
35 | ear_q = 9.26449 # Glasberg and Moore Parameters
36 | min_bw = 24.7
37 | order = 1
38 |
39 | # All of the following expressions are derived in Apple TR #35, "An
40 | # Efficient Implementation of the Patterson-Holdsworth Cochlear Filter
41 | # Bank." See pages 33-34.
42 | erb_point = (
43 | -ear_q * min_bw
44 | + np.exp(
45 | fraction * (
46 | -np.log(high_freq + ear_q * min_bw)
47 | + np.log(low_freq + ear_q * min_bw)
48 | )
49 | ) *
50 | (high_freq + ear_q * min_bw)
51 | )
52 |
53 | return erb_point
54 |
55 |
56 | def erb_space(
57 | low_freq=DEFAULT_LOW_FREQ,
58 | high_freq=DEFAULT_HIGH_FREQ,
59 | num=DEFAULT_FILTER_NUM):
60 | """
61 | This function computes an array of ``num`` frequencies uniformly spaced
62 | between ``high_freq`` and ``low_freq`` on an ERB scale.
63 |
64 | For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983).
65 | "Suggested formulae for calculating auditory-filter bandwidths and
66 | excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
67 | """
68 | return erb_point(
69 | low_freq,
70 | high_freq,
71 | np.arange(1, num + 1) / num
72 | )
73 |
74 |
75 | def centre_freqs(fs, num_freqs, cutoff):
76 | """
77 | Calculates an array of centre frequencies (for :func:`make_erb_filters`)
78 | from a sampling frequency, lower cutoff frequency and the desired number of
79 | filters.
80 |
81 | :param fs: sampling rate
82 | :param num_freqs: number of centre frequencies to calculate
83 | :type num_freqs: int
84 | :param cutoff: lower cutoff frequency
85 | :return: same as :func:`erb_space`
86 | """
87 | return erb_space(cutoff, fs / 2, num_freqs)
88 |
89 |
90 | def make_erb_filters(fs, centre_freqs, width=1.0):
91 | """
92 | This function computes the filter coefficients for a bank of
93 | Gammatone filters. These filters were defined by Patterson and Holdworth for
94 | simulating the cochlea.
95 |
96 | The result is returned as a :class:`ERBCoeffArray`. Each row of the
97 | filter arrays contains the coefficients for four second order filters. The
98 | transfer function for these four filters share the same denominator (poles)
99 | but have different numerators (zeros). All of these coefficients are
100 | assembled into one vector that the ERBFilterBank can take apart to implement
101 | the filter.
102 |
103 | The filter bank contains "numChannels" channels that extend from
104 | half the sampling rate (fs) to "lowFreq". Alternatively, if the numChannels
105 | input argument is a vector, then the values of this vector are taken to be
106 | the center frequency of each desired filter. (The lowFreq argument is
107 | ignored in this case.)
108 |
109 | Note this implementation fixes a problem in the original code by
110 | computing four separate second order filters. This avoids a big problem with
111 | round off errors in cases of very small cfs (100Hz) and large sample rates
112 | (44kHz). The problem is caused by roundoff error when a number of poles are
113 | combined, all very close to the unit circle. Small errors in the eigth order
114 | coefficient, are multiplied when the eigth root is taken to give the pole
115 | location. These small errors lead to poles outside the unit circle and
116 | instability. Thanks to Julius Smith for leading me to the proper
117 | explanation.
118 |
119 | Execute the following code to evaluate the frequency response of a 10
120 | channel filterbank::
121 |
122 | fcoefs = MakeERBFilters(16000,10,100);
123 | y = ERBFilterBank([1 zeros(1,511)], fcoefs);
124 | resp = 20*log10(abs(fft(y')));
125 | freqScale = (0:511)/512*16000;
126 | semilogx(freqScale(1:255),resp(1:255,:));
127 | axis([100 16000 -60 0])
128 | xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)');
129 |
130 | | Rewritten by Malcolm Slaney@Interval. June 11, 1998.
131 | | (c) 1998 Interval Research Corporation
132 | |
133 | | (c) 2012 Jason Heeris (Python implementation)
134 | """
135 | T = 1 / fs
136 | # Change the followFreqing three parameters if you wish to use a different
137 | # ERB scale. Must change in ERBSpace too.
138 | # TODO: factor these out
139 | ear_q = 9.26449 # Glasberg and Moore Parameters
140 | min_bw = 24.7
141 | order = 1
142 |
143 | erb = width*((centre_freqs / ear_q) ** order + min_bw ** order) ** ( 1 /order)
144 | B = 1.019 * 2 * np.pi * erb
145 |
146 | arg = 2 * centre_freqs * np.pi * T
147 | vec = np.exp(2j * arg)
148 |
149 | A0 = T
150 | A2 = 0
151 | B0 = 1
152 | B1 = -2 * np.cos(arg) / np.exp(B * T)
153 | B2 = np.exp(-2 * B * T)
154 |
155 | rt_pos = np.sqrt(3 + 2 ** 1.5)
156 | rt_neg = np.sqrt(3 - 2 ** 1.5)
157 |
158 | common = -T * np.exp(-(B * T))
159 |
160 | # TODO: This could be simplified to a matrix calculation involving the
161 | # constant first term and the alternating rt_pos/rt_neg and +/-1 second
162 | # terms
163 | k11 = np.cos(arg) + rt_pos * np.sin(arg)
164 | k12 = np.cos(arg) - rt_pos * np.sin(arg)
165 | k13 = np.cos(arg) + rt_neg * np.sin(arg)
166 | k14 = np.cos(arg) - rt_neg * np.sin(arg)
167 |
168 | A11 = common * k11
169 | A12 = common * k12
170 | A13 = common * k13
171 | A14 = common * k14
172 |
173 | gain_arg = np.exp(1j * arg - B * T)
174 |
175 | gain = np.abs(
176 | (vec - gain_arg * k11)
177 | * (vec - gain_arg * k12)
178 | * (vec - gain_arg * k13)
179 | * (vec - gain_arg * k14)
180 | * ( T * np.exp(B * T)
181 | / (-1 / np.exp(B * T) + 1 + vec * (1 - np.exp(B * T)))
182 | )**4
183 | )
184 |
185 | allfilts = np.ones_like(centre_freqs)
186 |
187 | fcoefs = np.column_stack([
188 | A0 * allfilts, A11, A12, A13, A14, A2*allfilts,
189 | B0 * allfilts, B1, B2,
190 | gain
191 | ])
192 |
193 | return fcoefs
194 |
195 |
196 | def erb_filterbank(wave, coefs):
197 | """
198 | :param wave: input data (one dimensional sequence)
199 | :param coefs: gammatone filter coefficients
200 |
201 | Process an input waveform with a gammatone filter bank. This function takes
202 | a single sound vector, and returns an array of filter outputs, one channel
203 | per row.
204 |
205 | The fcoefs parameter, which completely specifies the Gammatone filterbank,
206 | should be designed with the :func:`make_erb_filters` function.
207 |
208 | | Malcolm Slaney @ Interval, June 11, 1998.
209 | | (c) 1998 Interval Research Corporation
210 | | Thanks to Alain de Cheveigne' for his suggestions and improvements.
211 | |
212 | | (c) 2013 Jason Heeris (Python implementation)
213 | """
214 | output = np.zeros((coefs[:,9].shape[0], wave.shape[0]))
215 |
216 | gain = coefs[:, 9]
217 | # A0, A11, A2
218 | As1 = coefs[:, (0, 1, 5)]
219 | # A0, A12, A2
220 | As2 = coefs[:, (0, 2, 5)]
221 | # A0, A13, A2
222 | As3 = coefs[:, (0, 3, 5)]
223 | # A0, A14, A2
224 | As4 = coefs[:, (0, 4, 5)]
225 | # B0, B1, B2
226 | Bs = coefs[:, 6:9]
227 |
228 | # Loop over channels
229 | for idx in range(0, coefs.shape[0]):
230 | # These seem to be reversed (in the sense of A/B order), but that's what
231 | # the original code did...
232 | # Replacing these with polynomial multiplications reduces both accuracy
233 | # and speed.
234 | y1 = sgn.lfilter(As1[idx], Bs[idx], wave)
235 | y2 = sgn.lfilter(As2[idx], Bs[idx], y1)
236 | y3 = sgn.lfilter(As3[idx], Bs[idx], y2)
237 | y4 = sgn.lfilter(As4[idx], Bs[idx], y3)
238 | output[idx, :] = y4 / gain[idx]
239 |
240 | return output
241 |
--------------------------------------------------------------------------------
/gammatone/gammatone/gtgram.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | from __future__ import division
6 | import numpy as np
7 |
8 | from .filters import make_erb_filters, centre_freqs, erb_filterbank
9 |
10 | """
11 | This module contains functions for rendering "spectrograms" which use gammatone
12 | filterbanks instead of Fourier transforms.
13 | """
14 |
15 | def round_half_away_from_zero(num):
16 | """ Implement the round-half-away-from-zero rule, where fractional parts of
17 | 0.5 result in rounding up to the nearest positive integer for positive
18 | numbers, and down to the nearest negative number for negative integers.
19 | """
20 | return np.sign(num) * np.floor(np.abs(num) + 0.5)
21 |
22 |
23 | def gtgram_strides(fs, window_time, hop_time, filterbank_cols):
24 | """
25 | Calculates the window size for a gammatonegram.
26 |
27 | @return a tuple of (window_size, hop_samples, output_columns)
28 | """
29 | nwin = int(round_half_away_from_zero(window_time * fs))
30 | hop_samples = int(round_half_away_from_zero(hop_time * fs))
31 | columns = (1
32 | + int(
33 | np.floor(
34 | (filterbank_cols - nwin)
35 | / hop_samples
36 | )
37 | )
38 | )
39 |
40 | return (nwin, hop_samples, columns)
41 |
42 |
43 | def gtgram_xe(wave, fs, channels, f_min):
44 | """ Calculate the intermediate ERB filterbank processed matrix """
45 | cfs = centre_freqs(fs, channels, f_min)
46 | fcoefs = np.flipud(make_erb_filters(fs, cfs))
47 | xf = erb_filterbank(wave, fcoefs)
48 | xe = np.power(xf, 2)
49 | return xe
50 |
51 |
52 | def gtgram(
53 | wave,
54 | fs,
55 | window_time, hop_time,
56 | channels,
57 | f_min):
58 | """
59 | Calculate a spectrogram-like time frequency magnitude array based on
60 | gammatone subband filters. The waveform ``wave`` (at sample rate ``fs``) is
61 | passed through an multi-channel gammatone auditory model filterbank, with
62 | lowest frequency ``f_min`` and highest frequency ``f_max``. The outputs of
63 | each band then have their energy integrated over windows of ``window_time``
64 | seconds, advancing by ``hop_time`` secs for successive columns. These
65 | magnitudes are returned as a nonnegative real matrix with ``channels`` rows.
66 |
67 | | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu
68 | |
69 | | (c) 2013 Jason Heeris (Python implementation)
70 | """
71 | xe = gtgram_xe(wave, fs, channels, f_min)
72 |
73 | nwin, hop_samples, ncols = gtgram_strides(
74 | fs,
75 | window_time,
76 | hop_time,
77 | xe.shape[1]
78 | )
79 |
80 | y = np.zeros((channels, ncols))
81 |
82 | for cnum in range(ncols):
83 | segment = xe[:, cnum * hop_samples + np.arange(nwin)]
84 | y[:, cnum] = np.sqrt(segment.mean(1))
85 |
86 | return y
87 |
--------------------------------------------------------------------------------
/gammatone/gammatone/plot.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | """
6 | Plotting utilities related to gammatone analysis, primarily for use with
7 | ``matplotlib``.
8 | """
9 | from __future__ import division
10 | import argparse
11 | import os.path
12 |
13 | import matplotlib.pyplot
14 | import matplotlib.ticker
15 | import numpy as np
16 | import scipy.constants
17 | import scipy.io.wavfile
18 |
19 | from .filters import erb_point
20 | import gammatone.gtgram
21 | import gammatone.fftweight
22 |
23 |
24 | class ERBFormatter(matplotlib.ticker.EngFormatter):
25 | """
26 | Axis formatter for gammatone filterbank analysis. This formatter calculates
27 | the ERB spaced frequencies used for analysis, and renders them similarly to
28 | the engineering axis formatter.
29 |
30 | The scale is changed so that `[0, 1]` corresponds to ERB spaced frequencies
31 | from ``high_freq`` to ``low_freq`` (note the reversal). It should be used
32 | with ``imshow`` where the ``extent`` argument is ``[a, b, 1, 0]`` (again,
33 | note the inversion).
34 | """
35 |
36 | def __init__(self, low_freq, high_freq, *args, **kwargs):
37 | """
38 | Creates a new :class ERBFormatter: for use with ``matplotlib`` plots.
39 | Note that this class does not supply the ``units`` or ``places``
40 | arguments; typically these would be ``'Hz'`` and ``0``.
41 |
42 | :param low_freq: the low end of the gammatone filterbank frequency range
43 | :param high_freq: the high end of the gammatone filterbank frequency
44 | range
45 | """
46 | self.low_freq = low_freq
47 | self.high_freq = high_freq
48 | super().__init__(*args, **kwargs)
49 |
50 | def _erb_axis_scale(self, fraction):
51 | return erb_point(self.low_freq, self.high_freq, fraction)
52 |
53 | def __call__(self, val, pos=None):
54 | newval = self._erb_axis_scale(val)
55 | return super().__call__(newval, pos)
56 |
57 |
58 | def gtgram_plot(
59 | gtgram_function,
60 | axes, x, fs,
61 | window_time, hop_time, channels, f_min,
62 | imshow_args=None
63 | ):
64 | """
65 | Plots a spectrogram-like time frequency magnitude array based on gammatone
66 | subband filters.
67 |
68 | :param gtgram_function: A function with signature::
69 |
70 | fft_gtgram(
71 | wave,
72 | fs,
73 | window_time, hop_time,
74 | channels,
75 | f_min)
76 |
77 | See :func:`gammatone.gtgram.gtgram` for details of the paramters.
78 | """
79 | # Set a nice formatter for the y-axis
80 | formatter = ERBFormatter(f_min, fs/2, unit='Hz', places=0)
81 | axes.yaxis.set_major_formatter(formatter)
82 |
83 | # Figure out time axis scaling
84 | duration = len(x) / fs
85 |
86 | # Calculate 1:1 aspect ratio
87 | aspect_ratio = duration/scipy.constants.golden
88 |
89 | gtg = gtgram_function(x, fs, window_time, hop_time, channels, f_min)
90 | Z = np.flipud(20 * np.log10(gtg))
91 |
92 | img = axes.imshow(Z, extent=[0, duration, 1, 0], aspect=aspect_ratio)
93 |
94 |
95 | # Entry point for CLI script
96 |
97 | HELP_TEXT = """\
98 | Plots the gammatone filterbank analysis of a WAV file.
99 |
100 | If the file contains more than one channel, all channels are averaged before
101 | performing analysis.
102 | """
103 |
104 |
105 | def render_audio_from_file(path, duration, function):
106 | """
107 | Renders the given ``duration`` of audio from the audio file at ``path``
108 | using the gammatone spectrogram function ``function``.
109 | """
110 | samplerate, data = scipy.io.wavfile.read(path)
111 |
112 | # Average the stereo signal
113 | if duration:
114 | nframes = duration * samplerate
115 | data = data[0 : nframes, :]
116 |
117 | signal = data.mean(1)
118 |
119 | # Default gammatone-based spectrogram parameters
120 | twin = 0.08
121 | thop = twin / 2
122 | channels = 1024
123 | fmin = 20
124 |
125 | # Set up the plot
126 | fig = matplotlib.pyplot.figure()
127 | axes = fig.add_axes([0.1, 0.1, 0.8, 0.8])
128 |
129 | gtgram_plot(
130 | function,
131 | axes,
132 | signal,
133 | samplerate,
134 | twin, thop, channels, fmin)
135 |
136 | axes.set_title(os.path.basename(path))
137 | axes.set_xlabel("Time (s)")
138 | axes.set_ylabel("Frequency")
139 |
140 | matplotlib.pyplot.show()
141 |
142 |
143 | def main():
144 | """
145 | Entry point for CLI application to plot gammatonegrams of sound files.
146 | """
147 | parser = argparse.ArgumentParser(description=HELP_TEXT)
148 |
149 | parser.add_argument(
150 | 'sound_file',
151 | help="The sound file to graph. See the help text for supported formats.")
152 |
153 | parser.add_argument(
154 | '-d', '--duration', type=int,
155 | help="The time in seconds from the start of the audio to use for the "
156 | "graph (default is to use the whole file)."
157 | )
158 |
159 | parser.add_argument(
160 | '-a', '--accurate', action='store_const', dest='function',
161 | const=gammatone.gtgram.gtgram, default=gammatone.fftweight.fft_gtgram,
162 | help="Use the full filterbank approach instead of the weighted FFT "
163 | "approximation. This is much slower, and uses a lot of memory, but"
164 | " is more accurate."
165 | )
166 |
167 | args = parser.parse_args()
168 |
169 | return render_audio_from_file(args.sound_file, args.duration, args.function)
170 |
--------------------------------------------------------------------------------
/gammatone/setup.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | from setuptools import setup, find_packages
6 |
7 | setup(
8 | name = "Gammatone",
9 | version = "1.0",
10 | packages = find_packages(),
11 |
12 | install_requires = [
13 | 'numpy',
14 | 'scipy',
15 | 'nose',
16 | 'mock',
17 | 'matplotlib',
18 | ],
19 |
20 | entry_points = {
21 | 'console_scripts': [
22 | 'gammatone = gammatone.plot:main',
23 | ]
24 | }
25 | )
26 |
--------------------------------------------------------------------------------
/gammatone/test_generation/README:
--------------------------------------------------------------------------------
1 | These are Octave/MATLAB scripts that create test data for the Python
2 | implementation of that gammatone library.
3 |
4 | You must add both this directory and the top level 'auditory_toolkit' directory
5 | to your search path.
6 |
7 | The scripts are designed to run under MATLAB and Octave (using '--traditional').
8 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_ERBFilterBank.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_ERBFilterBank()
6 |
7 | erb_space_inputs = { ...
8 | 100, 11025, 10, sin(2*pi*220*[0:22050/100]'/22050); ...
9 | 20, 22050, 10, square(2*pi*150*[0:44100/200]'/44100); ...
10 | 20, 44100, 40, square(2*pi*12000*[0:88200/400]'/88200); ...
11 | 100, 11025, 1000, sawtooth(2*pi*10100*[0:22050/100]'/22050, 0.5); ...
12 | 500, 80000, 200, sawtooth(2*pi*3333*[0:160000/400]'/160000, 0.5); ...
13 | };
14 |
15 | erb_filter_inputs = { ...
16 | 44100, [22050; 2205; 220], square(2*pi*220*[0:44100/200]'/44100); ...
17 | 16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000], square(2*pi*2000*[0:16000/50]'/16000); ...
18 | 16000, [16000; 8000; 1], square(2*pi*880*[0:16000/50]'/16000); ...
19 | };
20 |
21 | num_tests = size(erb_space_inputs)(1) ...
22 | + size(erb_filter_inputs)(1);
23 |
24 | erb_filterbank_inputs = {};
25 |
26 | erb_filterbank_results = {};
27 |
28 | % This will ONLY generate tests that use the centre frequency inputs
29 |
30 | % ERBSpace generated inputs
31 | for tnum=1:size(erb_space_inputs)(1)
32 | [f_low, f_high, num_f, wave] = deal(erb_space_inputs{tnum,:});
33 | fs = f_high*2;
34 | f_arr = ERBSpace(f_low, f_high, num_f);
35 | fcoefs = MakeERBFilters(fs, f_arr, 0);
36 | erb_filterbank_inputs(tnum, :) = {fcoefs, wave};
37 | end
38 |
39 | % MakeERBFilters generated inputs
40 | for tnum=1:size(erb_filter_inputs)
41 | [fs, f_arr, wave] = deal(erb_filter_inputs{tnum,:});
42 | fcoefs = MakeERBFilters(fs, f_arr, 0);
43 | offset = size(erb_space_inputs)(1);
44 | erb_filterbank_inputs(offset+tnum, :) = {fcoefs, wave};
45 | end
46 |
47 | for tnum=1:num_tests
48 | fcoefs = erb_filterbank_inputs{tnum, 1};
49 | wave = erb_filterbank_inputs{tnum, 2};
50 | erb_filterbank_results(tnum, :) = ERBFilterBank(wave, fcoefs);
51 | end
52 |
53 | results_file = fullfile('..', 'tests', 'data', 'test_filterbank_data.mat');
54 | save(results_file, 'erb_filterbank_inputs', 'erb_filterbank_results');
55 | end
56 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_ERBSpace.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_ERBSpace()
6 |
7 | % Low freq, high freq, N
8 | erbspace_inputs = { ...
9 | 100, 11025, 100; ...
10 | 100, 22050, 100; ...
11 | 20, 22050, 100; ...
12 | 20, 44100, 100; ...
13 | 100, 11025, 10; ...
14 | 100, 11025, 1000; ...
15 | 500, 80000, 200; ...
16 | };
17 |
18 | erbspace_results = {};
19 |
20 | num_tests = size(erbspace_inputs)(1);
21 |
22 | for tnum=1:num_tests
23 | [f_low, f_high, num_f] = deal(erbspace_inputs{tnum,:});
24 | erbspace_results(tnum, :) = ERBSpace(f_low, f_high, num_f);
25 | end
26 |
27 | results_file = fullfile('..', 'tests', 'data', 'test_erbspace_data.mat');
28 | save(results_file, 'erbspace_inputs', 'erbspace_results');
29 | end
30 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_MakeERBFilters.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_MakeERBFilters()
6 |
7 | erb_space_inputs = { ...
8 | 100, 11025, 100; ...
9 | 100, 22050, 100; ...
10 | 20, 22050, 100; ...
11 | 20, 44100, 100; ...
12 | 100, 11025, 10; ...
13 | 100, 11025, 1000; ...
14 | 500, 80000, 200; ...
15 | };
16 |
17 | extra_inputs = { ...
18 | 44100, [22050; 2205; 220]; ...
19 | 16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000]; ...
20 | 16000, [16000; 8000; 1]; ...
21 | };
22 |
23 | num_tests = size(erb_space_inputs)(1) + size(extra_inputs)(1);
24 |
25 | erb_filter_inputs = {};
26 |
27 | erb_filter_results = {};
28 |
29 | % This will ONLY generate tests that use the centre frequency inputs
30 |
31 | % ERBSpace generated inputs
32 | for tnum=1:size(erb_space_inputs)(1)
33 | [f_low, f_high, num_f] = deal(erb_space_inputs{tnum,:});
34 | fs = f_high*2;
35 | cfs = ERBSpace(f_low, f_high, num_f);
36 | erb_filter_inputs(tnum, :) = {fs, cfs};
37 | end
38 |
39 | erb_filter_inputs = cat(1, erb_filter_inputs, extra_inputs);
40 |
41 | for tnum=1:num_tests
42 | fs = erb_filter_inputs{tnum, 1};
43 | cfs = erb_filter_inputs{tnum, 2};
44 | fcoefs = MakeERBFilters(fs, cfs, 0);
45 | erb_filter_results(tnum, :) = fcoefs;
46 | end
47 |
48 | results_file = fullfile('..', 'tests', 'data', 'test_erb_filter_data.mat');
49 | save(results_file, 'erb_filter_inputs', 'erb_filter_results');
50 | end
51 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_fft2gammatonemx.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_fft2gtmx()
6 | % Arguments:
7 | % nfft, sr, nfilts, width, minfreq, maxfreq, maxlen
8 |
9 | fft2gtmx_inputs = { ...
10 | 256 , 48000, 64 , 1 , 100, 48000/2 , 256; ...
11 | % Vary the width parameter
12 | 256 , 48000, 64 , 2 , 100, 48000/2 , 256; ...
13 | 256 , 48000, 64 , 4 , 100, 48000/2 , 256; ...
14 | 256 , 48000, 64 , 0.25, 100, 48000/2 , 256; ...
15 | % Vary sampling rate
16 | 256 , 96000, 64 , 1 , 100, 96000/2 , 256; ...
17 | % Vary upper frequency
18 | 256 , 48000, 64 , 1 , 100, 48000/2 , 256; ...
19 | 256 , 48000, 64 , 1 , 100, 48000/4 , 256; ...
20 | 256 , 48000, 64 , 1 , 100, 48000/10, 256; ...
21 | % Vary maxlen
22 | 256 , 48000, 64 , 1 , 100, 48000/2 , 128; ...
23 | 256 , 48000, 64 , 1 , 100, 48000/2 , 16; ...
24 | 256 , 48000, 64 , 1 , 100, 48000/2 , 99; ...
25 | % Vary sampling rate
26 | 1024, 48000, 128, 1 , 100, 48000/2 , 512; ...
27 | 1024, 48000, 128, 1 , 100, 48000/2 , 128; ...
28 | 64 , 44100, 32 , 1 , 20 , 44100/2 , 64; ...
29 | };
30 |
31 | fft2gtmx_results = {};
32 |
33 | for tnum=1:size(fft2gtmx_inputs)(1)
34 | [nfft, sr, nfilts, width, minfreq, maxfreq, maxlen] = deal(fft2gtmx_inputs{tnum,:});
35 | [wts, gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen);
36 | fft2gtmx_results(tnum, :) = {wts, gain};
37 | end
38 |
39 | results_file = fullfile('..', 'tests', 'data', 'test_fft2gtmx_data.mat');
40 | save(results_file, 'fft2gtmx_inputs', 'fft2gtmx_results');
41 | end
42 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_fft_gammatonegram.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_fft_gammatonegram()
6 | % Need:
7 | % wave
8 | % fs
9 | % window_time
10 | % hop_time
11 | % channels
12 | % f_min
13 | % f_max
14 |
15 | % Need to mock out:
16 | % make_erb_filters output (elide)
17 | % centre_freqs (elide)
18 | % erb_filterbank (depends on X, SR, N, FMIN)
19 |
20 | % Ensure reproducible tests
21 | rand('state', [3 1 4 1 5 9 2 7]);
22 |
23 | fft_gammatonegram_inputs = {
24 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ...
25 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ...
26 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ...
27 | 'rand_01' , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ...
28 | 'rand_02' , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ...
29 | 'rand_03' , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ...
30 | };
31 |
32 | % Mocked intermediate results for unit testing
33 | fft_gammatonegram_mocks = {};
34 |
35 | % Actual results
36 | fft_gammatonegram_results = {};
37 |
38 | for tnum=1:size(fft_gammatonegram_inputs)(1)
39 | [name, wave, fs, twin, thop, chs, fmin] = deal(fft_gammatonegram_inputs{tnum,:});
40 |
41 | % This is for mocking the output of the equivalent Python functions
42 | nfft = 2^(ceil(log(2*twin*fs)/log(2)));
43 | nwin = round(twin * fs);
44 | nhop = round(thop * fs);
45 |
46 | % Mock out the FFT weights as well
47 | wts = fft2gammatonemx( ...
48 | nfft, ...
49 | fs, ...
50 | chs, ...
51 | 1, ... % width is always 1 in the Python implementation
52 | fmin, ...
53 | fs/2, ...
54 | nfft/2+1 ...
55 | );
56 |
57 | % Mock out windowing function
58 | window = gtgram_window(nfft, nwin);
59 |
60 | res = gammatonegram( ...
61 | wave, ...
62 | fs, ...
63 | twin, ...
64 | thop, ...
65 | chs, ...
66 | fmin, ...
67 | fs/2, % fmax is always fs/2 in the Python version
68 | 1 % Use FFT method
69 | );
70 |
71 | fft_gammatonegram_mocks(tnum, :) = { ...
72 | wts ...
73 | };
74 |
75 | fft_gammatonegram_results(tnum, :) = { ...
76 | res, ...
77 | window, ...
78 | nfft, ...
79 | nwin, ...
80 | nhop ...
81 | };
82 |
83 | end;
84 |
85 | results_file = fullfile('..', 'tests', 'data', 'test_fft_gammatonegram_data.mat');
86 | save(results_file, 'fft_gammatonegram_inputs', 'fft_gammatonegram_mocks', 'fft_gammatonegram_results');
87 | end;
88 |
89 |
90 | function win = gtgram_window(n, w)
91 | % Reproduction of Dan Ellis' windowing function built in to specgram.m
92 | halflen = w/2;
93 | halff = n/2; % midpoint of win
94 | acthalflen = min(halff, halflen);
95 |
96 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
97 | win = zeros(1, n);
98 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
99 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
100 | end;
--------------------------------------------------------------------------------
/gammatone/test_generation/test_gammatonegram.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_gammatonegram()
6 | % Need:
7 | % wave
8 | % fs
9 | % window_time
10 | % hop_time
11 | % channels
12 | % f_min
13 | % f_max
14 |
15 | % Need to mock out:
16 | % make_erb_filters output (elide)
17 | % centre_freqs (elide)
18 | % erb_filterbank (depends on X, SR, N, FMIN)
19 |
20 | % Ensure reproducible tests
21 | rand('state', [3 1 4 1 5 9 2 7]);
22 |
23 | gammatonegram_inputs = {
24 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ...
25 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ...
26 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ...
27 | 'rand_01' , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ...
28 | 'rand_02' , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ...
29 | 'rand_03' , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ...
30 | };
31 |
32 | % Mocked intermediate results for unit testing
33 | gammatonegram_mocks = {};
34 |
35 | % Actual results
36 | gammatonegram_results = {};
37 |
38 | for tnum=1:size(gammatonegram_inputs)(1)
39 | [name, wave, fs, twin, thop, chs, fmin] = deal(gammatonegram_inputs{tnum,:});
40 | res = gammatonegram( ...
41 | wave, ...
42 | fs, ...
43 | twin, ...
44 | thop, ...
45 | chs, ...
46 | fmin, ...
47 | 0, % fmax is ignored
48 | 0 % Don't use FFT method
49 | );
50 |
51 | % This is for mocking the output of the equivalent Python functions
52 | nwin = round(twin * fs);
53 | hopsamps = round(thop * fs);
54 | f_coefs = flipud(MakeERBFilters(fs, chs, fmin));
55 | x_f = ERBFilterBank(wave, f_coefs);
56 | x_e = [x_f .^ 2];
57 | x_e_cols = size(x_e, 2);
58 | ncols = 1 + floor((x_e_cols - nwin) / hopsamps);
59 |
60 | % Mock out the ERB filter functions too
61 | fcoefs = flipud(MakeERBFilters(fs, chs, fmin));
62 | erb_fb_output = ERBFilterBank(wave, fcoefs);
63 |
64 | gammatonegram_mocks(tnum, :) = { ...
65 | erb_fb_output, ...
66 | x_e_cols ...
67 | };
68 |
69 | gammatonegram_results(tnum, :) = { ...
70 | res, ...
71 | nwin, ...
72 | hopsamps, ...
73 | ncols ...
74 | };
75 |
76 | end;
77 |
78 | results_file = fullfile('..', 'tests', 'data', 'test_gammatonegram_data.mat');
79 | save(results_file, 'gammatonegram_inputs', 'gammatonegram_mocks', 'gammatonegram_results');
80 | end;
81 |
--------------------------------------------------------------------------------
/gammatone/test_generation/test_specgram.m:
--------------------------------------------------------------------------------
1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | %
3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 | function test_specgram()
6 | % Need:
7 | % wave
8 | % nfft
9 | % fs
10 | % window_size
11 | % hop (technically the function takes the overlap, but only to recalculate this)
12 |
13 | % Ensure reproducible tests
14 | rand('state', [3 1 4 1 5 9 2 7]);
15 |
16 | specgram_inputs = {
17 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 2048, 22050, 551, 221; ...
18 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 1024, 48000, 480, 480; ...
19 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 4096, 48000, 1200, 480; ...
20 | 'rand_01' , rand([1, 4410 - 1]), 2048, 44100, 882, 662; ...
21 | 'rand_02' , rand([1, 9600 - 1]), 2048, 96000, 960, 480; ...
22 | 'rand_03' , rand([1, 4800 - 1]), 1024, 48000, 480, 480; ...
23 | };
24 |
25 | % Mocked intermediate results for unit testing
26 | specgram_mocks = {};
27 |
28 | % Actual results
29 | specgram_results = {};
30 |
31 | for tnum=1:size(specgram_inputs)(1)
32 | [name, wave, nfft, fs, nwin, nhop] = deal(specgram_inputs{tnum,:});
33 |
34 | % Mock out windowing function
35 | window = gtgram_window(nfft, nwin);
36 |
37 | res = specgram( ...
38 | wave, ...
39 | nfft, ...
40 | fs, ...
41 | nwin, ...
42 | nwin - nhop ...
43 | );
44 |
45 | specgram_mocks(tnum, :) = { ...
46 | window, ...
47 | };
48 |
49 | specgram_results(tnum, :) = { ...
50 | res, ...
51 | };
52 |
53 | end;
54 |
55 | results_file = fullfile('..', 'tests', 'data', 'test_specgram_data.mat');
56 | save(results_file, 'specgram_inputs', 'specgram_mocks', 'specgram_results');
57 | end;
58 |
59 |
60 | function win = gtgram_window(n, w)
61 | % Reproduction of Dan Ellis' windowing function built in to specgram.m
62 | halflen = w/2;
63 | halff = n/2; % midpoint of win
64 | acthalflen = min(halff, halflen);
65 |
66 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen));
67 | win = zeros(1, n);
68 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen);
69 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen);
70 | end;
--------------------------------------------------------------------------------
/gammatone/tests/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
2 | #
3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
5 |
6 | # Designate as module
7 |
--------------------------------------------------------------------------------
/gammatone/tests/data/test_erb_filter_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erb_filter_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_erbspace_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erbspace_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_fft2gtmx_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft2gtmx_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_fft_gammatonegram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft_gammatonegram_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_filterbank_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_filterbank_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_gammatonegram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_gammatonegram_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/data/test_specgram_data.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_specgram_data.mat
--------------------------------------------------------------------------------
/gammatone/tests/test_cfs.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | import nose
7 | from mock import patch
8 |
9 | import gammatone.filters
10 |
11 | EXPECTED_PARAMS = (
12 | ((0, 0, 0), (0, 0, 0)),
13 | ((22050, 100, 100), (100, 11025, 100)),
14 | ((44100, 100, 100), (100, 22050, 100)),
15 | ((44100, 100, 20), (20, 22050, 100)),
16 | ((88200, 100, 20), (20, 44100, 100)),
17 | ((22050, 100, 10), (10, 11025, 100)),
18 | ((22050, 1000, 100), (100, 11025, 1000)),
19 | ((160000, 500, 200), (200, 80000, 500)),
20 | )
21 |
22 |
23 | def test_centre_freqs():
24 | for args, params in EXPECTED_PARAMS:
25 | yield CentreFreqsTester(args, params)
26 |
27 |
28 | class CentreFreqsTester:
29 |
30 | def __init__(self, args, params):
31 | self.args = args
32 | self.params = params
33 | self.description = "Centre freqs for {:g} {:d} {:g}".format(*args)
34 |
35 |
36 | @patch('gammatone.filters.erb_space')
37 | def __call__(self, erb_space_mock):
38 | gammatone.filters.centre_freqs(*self.args)
39 | erb_space_mock.assert_called_with(*self.params)
40 |
41 |
42 | if __name__ == '__main__':
43 | nose.main()
44 |
--------------------------------------------------------------------------------
/gammatone/tests/test_erb_space.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | import nose
7 | import numpy as np
8 | import scipy.io
9 | from pkg_resources import resource_stream
10 |
11 | import gammatone.filters
12 |
13 | REF_DATA_FILENAME = 'data/test_erbspace_data.mat'
14 |
15 | INPUT_KEY = 'erbspace_inputs'
16 | RESULT_KEY = 'erbspace_results'
17 |
18 | INPUT_COLS = ('f_low', 'f_high', 'num_f')
19 | RESULT_COLS = ('cfs',)
20 |
21 |
22 | def load_reference_data():
23 | """ Load test data generated from the reference code """
24 | # Load test data
25 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
26 | data = scipy.io.loadmat(test_data, squeeze_me=False)
27 |
28 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
29 |
30 | for inputs, refs in zipped_data:
31 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
32 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
33 | yield (input_dict, ref_dict)
34 |
35 |
36 | def test_ERB_space_known_values():
37 | for inputs, refs in load_reference_data():
38 | args = (
39 | inputs['f_low'],
40 | inputs['f_high'],
41 | inputs['num_f'],
42 | )
43 |
44 | expected = (refs['cfs'],)
45 |
46 | yield ERBSpaceTester(args, expected)
47 |
48 |
49 | class ERBSpaceTester:
50 |
51 | def __init__(self, args, expected):
52 | self.args = args
53 | self.expected = expected[0]
54 | self.description = (
55 | "ERB space for {:.1f} {:.1f} {:d}".format(
56 | float(self.args[0]),
57 | float(self.args[1]),
58 | int(self.args[2]),
59 | )
60 | )
61 |
62 | def __call__(self):
63 | result = gammatone.filters.erb_space(*self.args)
64 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-10)
65 |
66 | if __name__ == '__main__':
67 | nose.main()
68 |
--------------------------------------------------------------------------------
/gammatone/tests/test_fft_gtgram.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | from mock import patch
7 | import nose
8 | import numpy as np
9 | import scipy.io
10 | from pkg_resources import resource_stream
11 |
12 | import gammatone.fftweight
13 |
14 | REF_DATA_FILENAME = 'data/test_fft_gammatonegram_data.mat'
15 |
16 | INPUT_KEY = 'fft_gammatonegram_inputs'
17 | MOCK_KEY = 'fft_gammatonegram_mocks'
18 | RESULT_KEY = 'fft_gammatonegram_results'
19 |
20 | INPUT_COLS = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin')
21 | MOCK_COLS = ('wts',)
22 | RESULT_COLS = ('res', 'window', 'nfft', 'nwin', 'nhop')
23 |
24 |
25 | def load_reference_data():
26 | """ Load test data generated from the reference code """
27 | # Load test data
28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
29 | data = scipy.io.loadmat(test_data, squeeze_me=False)
30 |
31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
32 | for inputs, mocks, refs in zipped_data:
33 | input_dict = dict(zip(INPUT_COLS, inputs))
34 | mock_dict = dict(zip(MOCK_COLS, mocks))
35 | ref_dict = dict(zip(RESULT_COLS, refs))
36 |
37 | yield (input_dict, mock_dict, ref_dict)
38 |
39 |
40 | def test_fft_specgram_window():
41 | for inputs, mocks, refs in load_reference_data():
42 | args = (
43 | refs['nfft'],
44 | refs['nwin'],
45 | )
46 |
47 | expected = (
48 | refs['window'],
49 | )
50 |
51 | yield FFTGtgramWindowTester(inputs['name'], args, expected)
52 |
53 | class FFTGtgramWindowTester:
54 |
55 | def __init__(self, name, args, expected):
56 | self.nfft = args[0].squeeze()
57 | self.nwin = args[1].squeeze()
58 | self.expected = expected[0].squeeze()
59 |
60 | self.description = (
61 | "FFT gammatonegram window for nfft = {:f}, nwin = {:f}".format(
62 | float(self.nfft), float(self.nwin)
63 | ))
64 |
65 | def __call__(self):
66 | result = gammatone.fftweight.specgram_window(self.nfft, self.nwin)
67 | max_diff = np.max(np.abs(result - self.expected))
68 | diagnostic = "Maximum difference: {:6e}".format(max_diff)
69 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
70 |
71 |
72 | def test_fft_gtgram():
73 | for inputs, mocks, refs in load_reference_data():
74 | args = (
75 | inputs['fs'],
76 | inputs['twin'],
77 | inputs['thop'],
78 | inputs['channels'],
79 | inputs['fmin']
80 | )
81 |
82 | yield FFTGammatonegramTester(
83 | inputs['name'][0],
84 | args,
85 | inputs['wave'],
86 | mocks['wts'],
87 | refs['window'],
88 | refs['res']
89 | )
90 |
91 | class FFTGammatonegramTester:
92 | """ Testing class for gammatonegram calculation """
93 |
94 | def __init__(self, name, args, sig, fft_weights, window, expected):
95 | self.signal = np.asarray(sig).squeeze()
96 | self.expected = np.asarray(expected).squeeze()
97 | self.fft_weights = np.asarray(fft_weights)
98 | self.args = args
99 | self.window = window.squeeze()
100 |
101 | self.description = "FFT gammatonegram for {:s}".format(name)
102 |
103 | def __call__(self):
104 | # Note that the second return value from fft_weights isn't actually used
105 | with patch(
106 | 'gammatone.fftweight.fft_weights',
107 | return_value=(self.fft_weights, None)), \
108 | patch(
109 | 'gammatone.fftweight.specgram_window',
110 | return_value=self.window):
111 |
112 | result = gammatone.fftweight.fft_gtgram(self.signal, *self.args)
113 |
114 | max_diff = np.max(np.abs(result - self.expected))
115 | diagnostic = "Maximum difference: {:6e}".format(max_diff)
116 |
117 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
118 |
119 | if __name__ == '__main__':
120 | nose.main()
121 |
--------------------------------------------------------------------------------
/gammatone/tests/test_fft_weights.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | from __future__ import division
7 | import nose
8 | import numpy as np
9 | import scipy.io
10 | from pkg_resources import resource_stream
11 |
12 | import gammatone.fftweight
13 |
14 | REF_DATA_FILENAME = 'data/test_fft2gtmx_data.mat'
15 |
16 | INPUT_KEY = 'fft2gtmx_inputs'
17 | RESULT_KEY = 'fft2gtmx_results'
18 |
19 | INPUT_COLS = ('nfft', 'sr', 'nfilts', 'width', 'fmin', 'fmax', 'maxlen')
20 | RESULT_COLS = ('weights', 'gain',)
21 |
22 | def load_reference_data():
23 | """ Load test data generated from the reference code """
24 | # Load test data
25 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
26 | data = scipy.io.loadmat(test_data, squeeze_me=False)
27 |
28 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
29 |
30 | for inputs, refs in zipped_data:
31 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
32 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
33 | yield (input_dict, ref_dict)
34 |
35 |
36 | def fft_weights_funcs(args, expected):
37 | """
38 | Construct a pair of unit tests for the gains and weights of the FFT to
39 | gammatonegram calculation. Returns two functions: test_gains, test_weights.
40 | """
41 | args = list(args)
42 | expected_weights = expected[0]
43 | expected_gains = expected[1]
44 |
45 | # Convert nfft, nfilts, maxlen to ints
46 | args[0] = int(args[0])
47 | args[2] = int(args[2])
48 | args[6] = int(args[6])
49 |
50 | weights, gains = gammatone.fftweight.fft_weights(*args)
51 |
52 | (test_weights_desc, test_gains_desc) = (
53 | "FFT weights {:s} for nfft = {:d}, fs = {:d}, nfilts = {:d}".format(
54 | label,
55 | int(args[0]),
56 | int(args[1]),
57 | int(args[2]),
58 | ) for label in ("weights", "gains"))
59 |
60 | def test_gains():
61 | assert gains.shape == expected_gains.shape
62 | assert np.allclose(gains, expected_gains, rtol=1e-6, atol=1e-12)
63 |
64 | def test_weights():
65 | assert weights.shape == expected_weights.shape
66 | assert np.allclose(weights, expected_weights, rtol=1e-6, atol=1e-12)
67 |
68 | test_gains.description = test_gains_desc
69 | test_weights.description = test_weights_desc
70 |
71 | return test_gains, test_weights
72 |
73 |
74 | def test_fft_weights():
75 | for inputs, refs in load_reference_data():
76 | args = tuple(inputs[col] for col in INPUT_COLS)
77 | expected = (refs['weights'], refs['gain'])
78 | test_gains, test_weights = fft_weights_funcs(args, expected)
79 | yield test_gains
80 | yield test_weights
81 |
82 |
83 | if __name__ == '__main__':
84 | nose.main()
85 |
--------------------------------------------------------------------------------
/gammatone/tests/test_filterbank.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | import nose
7 | import numpy as np
8 | import scipy.io
9 | from pkg_resources import resource_stream
10 |
11 | import gammatone.filters
12 |
13 | REF_DATA_FILENAME = 'data/test_filterbank_data.mat'
14 |
15 | INPUT_KEY = 'erb_filterbank_inputs'
16 | RESULT_KEY = 'erb_filterbank_results'
17 |
18 | INPUT_COLS = ('fcoefs', 'wave')
19 | RESULT_COLS = ('filterbank',)
20 |
21 | def load_reference_data():
22 | """ Load test data generated from the reference code """
23 | # Load test data
24 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
25 | data = scipy.io.loadmat(test_data, squeeze_me=False)
26 |
27 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
28 |
29 | for inputs, refs in zipped_data:
30 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
31 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
32 | yield (input_dict, ref_dict)
33 |
34 |
35 | def test_ERB_filterbank_known_values():
36 | for inputs, refs in load_reference_data():
37 | args = (
38 | inputs['wave'],
39 | inputs['fcoefs'],
40 | )
41 |
42 | expected = (refs['filterbank'],)
43 |
44 | yield ERBFilterBankTester(args, expected)
45 |
46 |
47 | class ERBFilterBankTester:
48 |
49 | def __init__(self, args, expected):
50 | self.signal = args[0]
51 | self.fcoefs = args[1]
52 | self.expected = expected[0]
53 |
54 | self.description = (
55 | "Gammatone filterbank result for {:.1f} ... {:.1f}".format(
56 | self.fcoefs[0][0],
57 | self.fcoefs[0][1]
58 | ))
59 |
60 | def __call__(self):
61 | result = gammatone.filters.erb_filterbank(self.signal, self.fcoefs)
62 | assert np.allclose(result, self.expected, rtol=1e-5, atol=1e-12)
63 |
64 |
65 | if __name__ == '__main__':
66 | nose.main()
67 |
--------------------------------------------------------------------------------
/gammatone/tests/test_gammatone_filters.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | import nose
7 | import numpy as np
8 | import scipy.io
9 | from pkg_resources import resource_stream
10 |
11 | import gammatone.filters
12 |
13 | REF_DATA_FILENAME = 'data/test_erb_filter_data.mat'
14 |
15 | INPUT_KEY = 'erb_filter_inputs'
16 | RESULT_KEY = 'erb_filter_results'
17 |
18 | INPUT_COLS = ('fs', 'cfs')
19 | RESULT_COLS = ('fcoefs',)
20 |
21 | def load_reference_data():
22 | """ Load test data generated from the reference code """
23 | # Load test data
24 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
25 | data = scipy.io.loadmat(test_data, squeeze_me=False)
26 |
27 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY])
28 |
29 | for inputs, refs in zipped_data:
30 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs)))
31 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs)))
32 | yield (input_dict, ref_dict)
33 |
34 |
35 | def test_make_ERB_filters_known_values():
36 | for inputs, refs in load_reference_data():
37 | args = (
38 | inputs['fs'],
39 | inputs['cfs'],
40 | )
41 |
42 | expected = (refs['fcoefs'],)
43 |
44 | yield MakeERBFiltersTester(args, expected)
45 |
46 |
47 | class MakeERBFiltersTester:
48 |
49 | def __init__(self, args, expected):
50 | self.fs = args[0]
51 | self.cfs = args[1]
52 | self.expected = expected[0]
53 | self.description = (
54 | "Gammatone filters for {:f}, {:.1f} ... {:.1f}".format(
55 | float(self.fs),
56 | float(self.cfs[0]),
57 | float(self.cfs[-1])
58 | ))
59 |
60 | def __call__(self):
61 | result = gammatone.filters.make_erb_filters(self.fs, self.cfs)
62 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12)
63 |
64 | if __name__ == '__main__':
65 | nose.main()
66 |
--------------------------------------------------------------------------------
/gammatone/tests/test_gammatonegram.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | from mock import patch
7 | import nose
8 | import numpy as np
9 | import scipy.io
10 | from pkg_resources import resource_stream
11 |
12 | import gammatone.gtgram
13 |
14 | REF_DATA_FILENAME = 'data/test_gammatonegram_data.mat'
15 |
16 | INPUT_KEY = 'gammatonegram_inputs'
17 | MOCK_KEY = 'gammatonegram_mocks'
18 | RESULT_KEY = 'gammatonegram_results'
19 |
20 | INPUT_COLS = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin')
21 | MOCK_COLS = ('erb_fb', 'erb_fb_cols')
22 | RESULT_COLS = ('gtgram', 'nwin', 'hopsamps', 'ncols')
23 |
24 |
25 | def load_reference_data():
26 | """ Load test data generated from the reference code """
27 | # Load test data
28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
29 | data = scipy.io.loadmat(test_data, squeeze_me=True)
30 |
31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
32 | for inputs, mocks, refs in zipped_data:
33 | input_dict = dict(zip(INPUT_COLS, inputs))
34 | mock_dict = dict(zip(MOCK_COLS, mocks))
35 | ref_dict = dict(zip(RESULT_COLS, refs))
36 | yield (input_dict, mock_dict, ref_dict)
37 |
38 |
39 | def test_nstrides():
40 | """ Test gamamtonegram stride calculations """
41 | for inputs, mocks, refs in load_reference_data():
42 | args = (
43 | inputs['fs'],
44 | inputs['twin'],
45 | inputs['thop'],
46 | mocks['erb_fb_cols']
47 | )
48 |
49 | expected = (
50 | refs['nwin'],
51 | refs['hopsamps'],
52 | refs['ncols']
53 | )
54 |
55 | yield GTGramStrideTester(inputs['name'], args, expected)
56 |
57 |
58 | class GTGramStrideTester:
59 | """ Testing class for gammatonegram stride calculation """
60 |
61 | def __init__(self, name, inputs, expected):
62 | self.inputs = inputs
63 | self.expected = expected
64 | self.description = "Gammatonegram strides for {:s}".format(name)
65 |
66 | def __call__(self):
67 | results = gammatone.gtgram.gtgram_strides(*self.inputs)
68 |
69 | diagnostic = (
70 | "result: {:s}, expected: {:s}".format(
71 | str(results),
72 | str(self.expected)
73 | )
74 | )
75 |
76 | # These are integer values, so use direct equality
77 | assert results == self.expected
78 |
79 |
80 | # TODO: possibly mock out gtgram_strides
81 |
82 | def test_gtgram():
83 | for inputs, mocks, refs in load_reference_data():
84 | args = (
85 | inputs['fs'],
86 | inputs['twin'],
87 | inputs['thop'],
88 | inputs['channels'],
89 | inputs['fmin']
90 | )
91 |
92 | yield GammatonegramTester(
93 | inputs['name'],
94 | args,
95 | inputs['wave'],
96 | mocks['erb_fb'],
97 | refs['gtgram']
98 | )
99 |
100 | class GammatonegramTester:
101 | """ Testing class for gammatonegram calculation """
102 |
103 | def __init__(self, name, args, sig, erb_fb_out, expected):
104 | self.signal = np.asarray(sig)
105 | self.expected = np.asarray(expected)
106 | self.erb_fb_out = np.asarray(erb_fb_out)
107 | self.args = args
108 |
109 | self.description = "Gammatonegram for {:s}".format(name)
110 |
111 | def __call__(self):
112 | with patch(
113 | 'gammatone.gtgram.erb_filterbank',
114 | return_value=self.erb_fb_out):
115 |
116 | result = gammatone.gtgram.gtgram(self.signal, *self.args)
117 |
118 | max_diff = np.max(np.abs(result - self.expected))
119 | diagnostic = "Maximum difference: {:6e}".format(max_diff)
120 |
121 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
122 |
123 | if __name__ == '__main__':
124 | nose.main()
125 |
--------------------------------------------------------------------------------
/gammatone/tests/test_specgram.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com
3 | #
4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause
5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING
6 | from mock import patch
7 | import nose
8 | import numpy as np
9 | import scipy.io
10 | from pkg_resources import resource_stream
11 |
12 | import gammatone.fftweight
13 |
14 | REF_DATA_FILENAME = 'data/test_specgram_data.mat'
15 |
16 | INPUT_KEY = 'specgram_inputs'
17 | MOCK_KEY = 'specgram_mocks'
18 | RESULT_KEY = 'specgram_results'
19 |
20 | INPUT_COLS = ('name', 'wave', 'nfft', 'fs', 'nwin', 'nhop')
21 | MOCK_COLS = ('window',)
22 | RESULT_COLS = ('res',)
23 |
24 |
25 | def load_reference_data():
26 | """ Load test data generated from the reference code """
27 | # Load test data
28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data:
29 | data = scipy.io.loadmat(test_data, squeeze_me=False)
30 |
31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY])
32 | for inputs, mocks, refs in zipped_data:
33 | input_dict = dict(zip(INPUT_COLS, inputs))
34 | mock_dict = dict(zip(MOCK_COLS, mocks))
35 | ref_dict = dict(zip(RESULT_COLS, refs))
36 |
37 | yield (input_dict, mock_dict, ref_dict)
38 |
39 |
40 | def test_specgram():
41 | for inputs, mocks, refs in load_reference_data():
42 | args = (
43 | inputs['nfft'],
44 | inputs['fs'],
45 | inputs['nwin'],
46 | inputs['nhop'],
47 | )
48 |
49 | yield SpecgramTester(
50 | inputs['name'][0],
51 | args,
52 | inputs['wave'],
53 | mocks['window'],
54 | refs['res']
55 | )
56 |
57 | class SpecgramTester:
58 | """ Testing class for specgram replacement calculation """
59 |
60 | def __init__(self, name, args, sig, window, expected):
61 | self.signal = np.asarray(sig).squeeze()
62 | self.expected = np.asarray(expected).squeeze()
63 | self.args = [int(a.squeeze()) for a in args]
64 | self.window = window.squeeze()
65 | self.description = "Specgram for {:s}".format(name)
66 |
67 |
68 | def __call__(self):
69 | with patch(
70 | 'gammatone.fftweight.specgram_window',
71 | return_value=self.window):
72 | result = gammatone.fftweight.specgram(self.signal, *self.args)
73 |
74 | max_diff = np.max(np.abs(result - self.expected))
75 | diagnostic = "Maximum difference: {:6e}".format(max_diff)
76 |
77 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic
78 |
79 | if __name__ == '__main__':
80 | nose.main()
81 |
--------------------------------------------------------------------------------
/images/CRNN_SELDT_DCASE2020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/CRNN_SELDT_DCASE2020.png
--------------------------------------------------------------------------------
/images/SELDnet_output.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/SELDnet_output.jpg
--------------------------------------------------------------------------------
/images/scse_cropped.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/scse_cropped.pdf
--------------------------------------------------------------------------------
/images/seld-squeeze-structure.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld-squeeze-structure.pdf
--------------------------------------------------------------------------------
/images/seld_squeeze_structure_image.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld_squeeze_structure_image.jpg
--------------------------------------------------------------------------------
/keras_model.py:
--------------------------------------------------------------------------------
1 | #
2 | # The SELDnet architecture
3 | #
4 |
5 | from keras.layers import (Bidirectional, Conv2D, MaxPooling2D, Input, Concatenate,
6 | Dense, Activation, Dropout, Reshape, Permute,
7 | GlobalAveragePooling2D, add, Activation, Input, Flatten, Lambda,
8 | GlobalAveragePooling1D, Reshape, ELU, multiply)
9 | #from keras.layers.core import Dense, Activation, Dropout, Reshape, Permute
10 | from keras.layers.recurrent import GRU
11 | from keras.layers.normalization import BatchNormalization
12 | from keras.models import Model
13 | from keras.layers.wrappers import TimeDistributed
14 | from keras.optimizers import Adam
15 | from keras.models import load_model
16 | import keras
17 | keras.backend.set_image_data_format('channels_first')
18 | from IPython import embed
19 | import numpy as np
20 |
21 | import keras.backend as K
22 | import warnings # added
23 |
24 | # From https://github.com/keras-team/keras-applications/blob/e52c477/keras_applications/imagenet_utils.py#L235-L331
25 | def _obtain_input_shape(input_shape,
26 | default_size,
27 | min_size,
28 | data_format,
29 | require_flatten,
30 | weights=None):
31 | """Internal utility to compute/validate a model's tensor shape.
32 | # Arguments
33 | input_shape: Either None (will return the default network input shape),
34 | or a user-provided shape to be validated.
35 | default_size: Default input width/height for the model.
36 | min_size: Minimum input width/height accepted by the model.
37 | data_format: Image data format to use.
38 | require_flatten: Whether the model is expected to
39 | be linked to a classifier via a Flatten layer.
40 | weights: One of `None` (random initialization)
41 | or 'imagenet' (pre-training on ImageNet).
42 | If weights='imagenet' input channels must be equal to 3.
43 | # Returns
44 | An integer shape tuple (may include None entries).
45 | # Raises
46 | ValueError: In case of invalid argument values.
47 | """
48 | if weights != 'imagenet' and input_shape and len(input_shape) == 3:
49 | if data_format == 'channels_first':
50 | if input_shape[0] not in {1, 3}:
51 | warnings.warn(
52 | 'This model usually expects 1 or 3 input channels. '
53 | 'However, it was passed an input_shape with {input_shape}'
54 | ' input channels.'.format(input_shape=input_shape[0]))
55 | default_shape = (input_shape[0], default_size, default_size)
56 | else:
57 | if input_shape[-1] not in {1, 3}:
58 | warnings.warn(
59 | 'This model usually expects 1 or 3 input channels. '
60 | 'However, it was passed an input_shape with {n_input_channels}'
61 | ' input channels.'.format(n_input_channels=input_shape[-1]))
62 | default_shape = (default_size, default_size, input_shape[-1])
63 | else:
64 | if data_format == 'channels_first':
65 | default_shape = (3, default_size, default_size)
66 | else:
67 | default_shape = (default_size, default_size, 3)
68 | if weights == 'imagenet' and require_flatten:
69 | if input_shape is not None:
70 | if input_shape != default_shape:
71 | raise ValueError('When setting `include_top=True` '
72 | 'and loading `imagenet` weights, '
73 | '`input_shape` should be {default_shape}.'.format(default_shape=default_shape))
74 | return default_shape
75 | if input_shape:
76 | if data_format == 'channels_first':
77 | if input_shape is not None:
78 | if len(input_shape) != 3:
79 | raise ValueError(
80 | '`input_shape` must be a tuple of three integers.')
81 | if input_shape[0] != 3 and weights == 'imagenet':
82 | raise ValueError('The input must have 3 channels; got '
83 | '`input_shape={input_shape}`'.format(input_shape=input_shape))
84 | if ((input_shape[1] is not None and input_shape[1] < min_size) or
85 | (input_shape[2] is not None and input_shape[2] < min_size)):
86 | raise ValueError('Input size must be at least {min_size}x{min_size};'
87 | ' got `input_shape={input_shape}`'.format(min_size=min_size,
88 | input_shape=input_shape))
89 | else:
90 | if input_shape is not None:
91 | if len(input_shape) != 3:
92 | raise ValueError(
93 | '`input_shape` must be a tuple of three integers.')
94 | if input_shape[-1] != 3 and weights == 'imagenet':
95 | raise ValueError('The input must have 3 channels; got '
96 | '`input_shape={input_shape}`'.format(input_shape=input_shape))
97 | if ((input_shape[0] is not None and input_shape[0] < min_size) or
98 | (input_shape[1] is not None and input_shape[1] < min_size)):
99 | raise ValueError('Input size must be at least {min_size}x{min_size};'
100 | ' got `input_shape={input_shape}`'.format(min_size=min_size,
101 | input_shape=input_shape))
102 | else:
103 | if require_flatten:
104 | input_shape = default_shape
105 | else:
106 | if data_format == 'channels_first':
107 | input_shape = (3, None, None)
108 | else:
109 | input_shape = (None, None, 3)
110 | if require_flatten:
111 | if None in input_shape:
112 | raise ValueError('If `include_top` is True, '
113 | 'you should specify a static `input_shape`. '
114 | 'Got `input_shape={input_shape}`'.format(input_shape=input_shape))
115 | return input_shape
116 |
117 |
118 | def squeeze_excite_block(input_tensor, ratio=16):
119 | """ Create a channel-wise squeeze-excite block
120 | Args:
121 | input_tensor: input Keras tensor
122 | ratio: number of output filters
123 | Returns: a Keras tensor
124 | References
125 | - [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507)
126 | """
127 | init = input_tensor
128 | channel_axis = 1 if K.image_data_format() == "channels_first" else -1
129 | filters = _tensor_shape(init)[channel_axis]
130 | se_shape = (1, 1, filters)
131 |
132 | se = GlobalAveragePooling2D()(init)
133 | se = Reshape(se_shape)(se)
134 | se = Dense(filters // ratio, activation='relu', kernel_initializer='he_normal', use_bias=False)(se)
135 | se = Dense(filters, activation='sigmoid', kernel_initializer='he_normal', use_bias=False)(se)
136 |
137 | if K.image_data_format() == 'channels_first':
138 | se = Permute((3, 1, 2))(se)
139 |
140 | x = multiply([init, se])
141 | return x
142 |
143 |
144 | def spatial_squeeze_excite_block(input_tensor):
145 | """ Create a spatial squeeze-excite block
146 | Args:
147 | input_tensor: input Keras tensor
148 | Returns: a Keras tensor
149 | References
150 | - [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
151 | """
152 |
153 | se = Conv2D(1, (1, 1), activation='sigmoid', use_bias=False,
154 | kernel_initializer='he_normal')(input_tensor)
155 |
156 | x = multiply([input_tensor, se])
157 | return x
158 |
159 |
160 | def channel_spatial_squeeze_excite(input_tensor, ratio=16):
161 | """ Create a spatial squeeze-excite block
162 | Args:
163 | input_tensor: input Keras tensor
164 | ratio: number of output filters
165 | Returns: a Keras tensor
166 | References
167 | - [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507)
168 | - [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579)
169 | """
170 |
171 | cse = squeeze_excite_block(input_tensor, ratio)
172 | sse = spatial_squeeze_excite_block(input_tensor)
173 |
174 | x = add([cse, sse])
175 | return x
176 |
177 | def _tensor_shape(tensor):
178 | return getattr(tensor, '_keras_shape')
179 |
180 | def get_model(data_in, data_out, dropout_rate, nb_cnn2d_filt, f_pool_size, t_pool_size,
181 | rnn_size, fnn_size, weights, doa_objective, baseline, ratio):
182 | # model definition
183 | spec_start = Input(shape=(data_in[-3], data_in[-2], data_in[-1]))
184 |
185 | # CNN
186 | spec_cnn = spec_start
187 | for i, convCnt in enumerate(f_pool_size):
188 |
189 | if baseline is False:
190 |
191 | spec_aux = spec_cnn
192 | spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn)
193 | spec_cnn = BatchNormalization()(spec_cnn)
194 | spec_cnn = ELU()(spec_cnn)
195 | spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn)
196 | spec_cnn = BatchNormalization()(spec_cnn)
197 |
198 | spec_aux = Conv2D(nb_cnn2d_filt, 1, padding='same')(spec_aux)
199 | spec_aux = BatchNormalization()(spec_aux)
200 |
201 | spec_cnn = add([spec_cnn,spec_aux])
202 | spec_cnn = ELU()(spec_cnn)
203 |
204 | if ratio != 0:
205 |
206 | spec_cnn = channel_spatial_squeeze_excite(spec_cnn,ratio=ratio)
207 |
208 | spec_cnn = add([spec_cnn, spec_aux])
209 |
210 | else:
211 |
212 | spec_cnn = Conv2D(filters=nb_cnn2d_filt, kernel_size=(3, 3), padding='same')(spec_cnn)
213 | spec_cnn = BatchNormalization()(spec_cnn)
214 | spec_cnn = Activation('relu')(spec_cnn)
215 | spec_cnn = MaxPooling2D(pool_size=(t_pool_size[i], f_pool_size[i]))(spec_cnn)
216 | spec_cnn = Dropout(dropout_rate)(spec_cnn)
217 | spec_cnn = Permute((2, 1, 3))(spec_cnn)
218 |
219 | # RNN
220 | spec_rnn = Reshape((data_out[0][-2], -1))(spec_cnn)
221 | for nb_rnn_filt in rnn_size:
222 | spec_rnn = Bidirectional(
223 | GRU(nb_rnn_filt, activation='tanh', dropout=dropout_rate, recurrent_dropout=dropout_rate,
224 | return_sequences=True),
225 | merge_mode='mul'
226 | )(spec_rnn)
227 |
228 | # FC - DOA
229 | doa = spec_rnn
230 | for nb_fnn_filt in fnn_size:
231 | doa = TimeDistributed(Dense(nb_fnn_filt))(doa)
232 | doa = Dropout(dropout_rate)(doa)
233 |
234 | doa = TimeDistributed(Dense(data_out[1][-1]))(doa)
235 | doa = Activation('tanh', name='doa_out')(doa)
236 |
237 | # FC - SED
238 | sed = spec_rnn
239 | for nb_fnn_filt in fnn_size:
240 | sed = TimeDistributed(Dense(nb_fnn_filt))(sed)
241 | sed = Dropout(dropout_rate)(sed)
242 | sed = TimeDistributed(Dense(data_out[0][-1]))(sed)
243 | sed = Activation('sigmoid', name='sed_out')(sed)
244 |
245 | model = None
246 | if doa_objective is 'mse':
247 | model = Model(inputs=spec_start, outputs=[sed, doa])
248 | model.compile(optimizer=Adam(), loss=['binary_crossentropy', 'mse'], loss_weights=weights)
249 | elif doa_objective is 'masked_mse':
250 | doa_concat = Concatenate(axis=-1, name='doa_concat')([sed, doa])
251 | model = Model(inputs=spec_start, outputs=[sed, doa_concat])
252 | model.compile(optimizer=Adam(), loss=['binary_crossentropy', masked_mse], loss_weights=weights)
253 | else:
254 | print('ERROR: Unknown doa_objective: {}'.format(doa_objective))
255 | exit()
256 | model.summary()
257 | return model
258 |
259 |
260 | def masked_mse(y_gt, model_out):
261 | # SED mask: Use only the predicted DOAs when gt SED > 0.5
262 | sed_out = y_gt[:, :, :14] >= 0.5 #TODO fix this hardcoded value of number of classes
263 | sed_out = keras.backend.repeat_elements(sed_out, 3, -1)
264 | sed_out = keras.backend.cast(sed_out, 'float32')
265 |
266 | # Use the mask to computed mse now. Normalize with the mask weights #TODO fix this hardcoded value of number of classes
267 | return keras.backend.sqrt(keras.backend.sum(keras.backend.square(y_gt[:, :, 14:] - model_out[:, :, 14:]) * sed_out))/keras.backend.sum(sed_out)
268 |
269 |
270 | def load_seld_model(model_file, doa_objective):
271 | if doa_objective is 'mse':
272 | return load_model(model_file)
273 | elif doa_objective is 'masked_mse':
274 | return load_model(model_file, custom_objects={'masked_mse': masked_mse})
275 | else:
276 | print('ERROR: Unknown doa objective: {}'.format(doa_objective))
277 | exit()
278 |
279 |
280 |
281 |
--------------------------------------------------------------------------------
/metrics/LICENSE.md:
--------------------------------------------------------------------------------
1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
2 | Copyright (c) 2020 Tampere University and its licensors
3 |
4 | Permission is hereby granted, free of charge, to any person obtaining a copy
5 | of this script, SELD_evaluation_metrics.py (the "Software"), to deal
6 | in the Software without restriction, including without limitation the rights
7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8 | copies of the Software, and to permit persons to whom the Software is
9 | furnished to do so, subject to the following conditions:
10 |
11 | The above copyright notice and this permission notice shall be included in all
12 | copies or substantial portions of the Software.
13 |
14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20 | SOFTWARE.
21 |
22 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
23 |
--------------------------------------------------------------------------------
/metrics/SELD_evaluation_metrics.py:
--------------------------------------------------------------------------------
1 | #
2 | # Implements the localization and detection metrics proposed in the paper
3 | #
4 | # Joint Measurement of Localization and Detection of Sound Events
5 | # Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, Tuomas Virtanen
6 | # WASPAA 2019
7 | #
8 | #
9 | # This script has MIT license
10 | #
11 |
12 | import numpy as np
13 | from IPython import embed
14 | eps = np.finfo(np.float).eps
15 | from scipy.optimize import linear_sum_assignment
16 |
17 |
18 | class SELDMetrics(object):
19 | def __init__(self, doa_threshold=20, nb_classes=11):
20 | '''
21 | This class implements both the class-sensitive localization and location-sensitive detection metrics.
22 | Additionally, based on the user input, the corresponding averaging is performed within the segment.
23 |
24 | :param nb_classes: Number of sound classes. In the paper, nb_classes = 11
25 | :param doa_thresh: DOA threshold for location sensitive detection.
26 | '''
27 |
28 | self._TP = 0
29 | self._FP = 0
30 | self._TN = 0
31 | self._FN = 0
32 |
33 | self._S = 0
34 | self._D = 0
35 | self._I = 0
36 |
37 | self._Nref = 0
38 | self._Nsys = 0
39 |
40 | self._total_DE = 0
41 | self._DE_TP = 0
42 |
43 | self._spatial_T = doa_threshold
44 | self._nb_classes = nb_classes
45 |
46 | def compute_seld_scores(self):
47 | '''
48 | Collect the final SELD scores
49 |
50 | :return: returns both location-sensitive detection scores and class-sensitive localization scores
51 | '''
52 |
53 | # Location-senstive detection performance
54 | ER = (self._S + self._D + self._I) / float(self._Nref + eps)
55 |
56 | prec = float(self._TP) / float(self._Nsys + eps)
57 | recall = float(self._TP) / float(self._Nref + eps)
58 | F = 2 * prec * recall / (prec + recall + eps)
59 |
60 | # Class-sensitive localization performance
61 | if self._DE_TP:
62 | DE = self._total_DE / float(self._DE_TP + eps)
63 | else:
64 | # When the total number of prediction is zero
65 | DE = 180
66 |
67 | DE_prec = float(self._DE_TP) / float(self._Nsys + eps)
68 | DE_recall = float(self._DE_TP) / float(self._Nref + eps)
69 | DE_F = 2 * DE_prec * DE_recall / (DE_prec + DE_recall + eps)
70 |
71 | return ER, F, DE, DE_F
72 |
73 | def update_seld_scores_xyz(self, pred, gt):
74 | '''
75 | Implements the spatial error averaging according to equation [5] in the paper, using Cartesian distance
76 |
77 | :param pred: dictionary containing class-wise prediction results for each N-seconds segment block
78 | :param gt: dictionary containing class-wise groundtruth for each N-seconds segment block
79 | '''
80 | for block_cnt in range(len(gt.keys())):
81 | # print('\nblock_cnt', block_cnt, end='')
82 | loc_FN, loc_FP = 0, 0
83 | for class_cnt in range(self._nb_classes):
84 | # print('\tclass:', class_cnt, end='')
85 | # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment
86 | if class_cnt in gt[block_cnt]:
87 | self._Nref += 1
88 | if class_cnt in pred[block_cnt]:
89 | self._Nsys += 1
90 |
91 | if class_cnt in gt[block_cnt] and class_cnt in pred[block_cnt]:
92 | # True positives or False negative case
93 |
94 | # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then
95 | # calculate the spatial distance using the following code. In the current code, if there are multiple
96 | # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it.
97 |
98 | total_spatial_dist = 0
99 | total_framewise_matching_doa = 0
100 | gt_ind_list = gt[block_cnt][class_cnt][0][0]
101 | pred_ind_list = pred[block_cnt][class_cnt][0][0]
102 | for gt_ind, gt_val in enumerate(gt_ind_list):
103 | if gt_val in pred_ind_list:
104 | total_framewise_matching_doa += 1
105 | pred_ind = pred_ind_list.index(gt_val)
106 |
107 | gt_arr = np.array(gt[block_cnt][class_cnt][0][1][gt_ind])
108 | pred_arr = np.array(pred[block_cnt][class_cnt][0][1][pred_ind])
109 |
110 | if gt_arr.shape[0]==1 and pred_arr.shape[0]==1:
111 | total_spatial_dist += distance_between_cartesian_coordinates(gt_arr[0][0], gt_arr[0][1], gt_arr[0][2], pred_arr[0][0], pred_arr[0][1], pred_arr[0][2])
112 | else:
113 | total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr)
114 |
115 | if total_spatial_dist == 0 and total_framewise_matching_doa == 0:
116 | loc_FN += 1
117 | self._FN += 1
118 | else:
119 | avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa)
120 |
121 | self._total_DE += avg_spatial_dist
122 | self._DE_TP += 1
123 |
124 | if avg_spatial_dist <= self._spatial_T:
125 | self._TP += 1
126 | else:
127 | loc_FN += 1
128 | self._FN += 1
129 | elif class_cnt in gt[block_cnt] and class_cnt not in pred[block_cnt]:
130 | # False negative
131 | loc_FN += 1
132 | self._FN += 1
133 | elif class_cnt not in gt[block_cnt] and class_cnt in pred[block_cnt]:
134 | # False positive
135 | loc_FP += 1
136 | self._FP += 1
137 | elif class_cnt not in gt[block_cnt] and class_cnt not in pred[block_cnt]:
138 | # True negative
139 | self._TN += 1
140 |
141 | self._S += np.minimum(loc_FP, loc_FN)
142 | self._D += np.maximum(0, loc_FN - loc_FP)
143 | self._I += np.maximum(0, loc_FP - loc_FN)
144 | return
145 |
146 | def update_seld_scores(self, pred_deg, gt_deg):
147 | '''
148 | Implements the spatial error averaging according to equation [5] in the paper, using Polar distance
149 | Expects the angles in degrees
150 |
151 | :param pred_deg: dictionary containing class-wise prediction results for each N-seconds segment block
152 | :param gt_deg: dictionary containing class-wise groundtruth for each N-seconds segment block
153 | '''
154 | for block_cnt in range(len(gt_deg.keys())):
155 | # print('\nblock_cnt', block_cnt, end='')
156 | loc_FN, loc_FP = 0, 0
157 | for class_cnt in range(self._nb_classes):
158 | # print('\tclass:', class_cnt, end='')
159 | # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment
160 | if class_cnt in gt_deg[block_cnt]:
161 | self._Nref += 1
162 | if class_cnt in pred_deg[block_cnt]:
163 | self._Nsys += 1
164 |
165 | if class_cnt in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]:
166 | # True positives or False negative case
167 |
168 | # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then
169 | # calculate the spatial distance using the following code. In the current code, if there are multiple
170 | # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it.
171 | total_spatial_dist = 0
172 | total_framewise_matching_doa = 0
173 | gt_ind_list = gt_deg[block_cnt][class_cnt][0][0]
174 | pred_ind_list = pred_deg[block_cnt][class_cnt][0][0]
175 | for gt_ind, gt_val in enumerate(gt_ind_list):
176 | if gt_val in pred_ind_list:
177 | total_framewise_matching_doa += 1
178 | pred_ind = pred_ind_list.index(gt_val)
179 |
180 | gt_arr = np.array(gt_deg[block_cnt][class_cnt][0][1][gt_ind]) * np.pi / 180
181 | pred_arr = np.array(pred_deg[block_cnt][class_cnt][0][1][pred_ind]) * np.pi / 180
182 | if gt_arr.shape[0]==1 and pred_arr.shape[0]==1:
183 | total_spatial_dist += distance_between_spherical_coordinates_rad(gt_arr[0][0], gt_arr[0][1], pred_arr[0][0], pred_arr[0][1])
184 | else:
185 | total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr)
186 |
187 | if total_spatial_dist == 0 and total_framewise_matching_doa == 0:
188 | loc_FN += 1
189 | self._FN += 1
190 | else:
191 | avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa)
192 |
193 | self._total_DE += avg_spatial_dist
194 | self._DE_TP += 1
195 |
196 | if avg_spatial_dist <= self._spatial_T:
197 | self._TP += 1
198 | else:
199 | loc_FN += 1
200 | self._FN += 1
201 | elif class_cnt in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]:
202 | # False negative
203 | loc_FN += 1
204 | self._FN += 1
205 | elif class_cnt not in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]:
206 | # False positive
207 | loc_FP += 1
208 | self._FP += 1
209 | elif class_cnt not in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]:
210 | # True negative
211 | self._TN += 1
212 |
213 | self._S += np.minimum(loc_FP, loc_FN)
214 | self._D += np.maximum(0, loc_FN - loc_FP)
215 | self._I += np.maximum(0, loc_FP - loc_FN)
216 | return
217 |
218 |
219 | def distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2):
220 | """
221 | Angular distance between two spherical coordinates
222 | MORE: https://en.wikipedia.org/wiki/Great-circle_distance
223 |
224 | :return: angular distance in degrees
225 | """
226 | dist = np.sin(ele1) * np.sin(ele2) + np.cos(ele1) * np.cos(ele2) * np.cos(np.abs(az1 - az2))
227 | # Making sure the dist values are in -1 to 1 range, else np.arccos kills the job
228 | dist = np.clip(dist, -1, 1)
229 | dist = np.arccos(dist) * 180 / np.pi
230 | return dist
231 |
232 |
233 | def distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2):
234 | """
235 | Angular distance between two cartesian coordinates
236 | MORE: https://en.wikipedia.org/wiki/Great-circle_distance
237 | Check 'From chord length' section
238 |
239 | :return: angular distance in degrees
240 | """
241 | # Normalize the Cartesian vectors
242 | N1 = np.sqrt(x1**2 + y1**2 + z1**2 + 1e-10)
243 | N2 = np.sqrt(x2**2 + y2**2 + z2**2 + 1e-10)
244 | x1, y1, z1, x2, y2, z2 = x1/N1, y1/N1, z1/N1, x2/N2, y2/N2, z2/N2
245 |
246 | #Compute the distance
247 | dist = x1*x2 + y1*y2 + z1*z2
248 | dist = np.clip(dist, -1, 1)
249 | dist = np.arccos(dist) * 180 / np.pi
250 | return dist
251 |
252 |
253 | def least_distance_between_gt_pred(gt_list, pred_list):
254 | """
255 | Shortest distance between two sets of DOA coordinates. Given a set of groundtruth coordinates,
256 | and its respective predicted coordinates, we calculate the distance between each of the
257 | coordinate pairs resulting in a matrix of distances, where one axis represents the number of groundtruth
258 | coordinates and the other the predicted coordinates. The number of estimated peaks need not be the same as in
259 | groundtruth, thus the distance matrix is not always a square matrix. We use the hungarian algorithm to find the
260 | least cost in this distance matrix.
261 | :param gt_list_xyz: list of ground-truth Cartesian or Polar coordinates in Radians
262 | :param pred_list_xyz: list of predicted Carteisan or Polar coordinates in Radians
263 | :return: cost - distance
264 | :return: less - number of DOA's missed
265 | :return: extra - number of DOA's over-estimated
266 | """
267 | gt_len, pred_len = gt_list.shape[0], pred_list.shape[0]
268 | ind_pairs = np.array([[x, y] for y in range(pred_len) for x in range(gt_len)])
269 | cost_mat = np.zeros((gt_len, pred_len))
270 |
271 | if gt_len and pred_len:
272 | if len(gt_list[0]) == 3: #Cartesian
273 | x1, y1, z1, x2, y2, z2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], gt_list[ind_pairs[:, 0], 2], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1], pred_list[ind_pairs[:, 1], 2]
274 | cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2)
275 | else:
276 | az1, ele1, az2, ele2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1]
277 | cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2)
278 |
279 | row_ind, col_ind = linear_sum_assignment(cost_mat)
280 | cost = cost_mat[row_ind, col_ind].sum()
281 | return cost
282 |
283 |
284 | def early_stopping_metric(sed_error, doa_error):
285 | """
286 | Compute early stopping metric from sed and doa errors.
287 |
288 | :param sed_error: [error rate (0 to 1 range), f score (0 to 1 range)]
289 | :param doa_error: [doa error (in degrees), frame recall (0 to 1 range)]
290 | :return: early stopping metric result
291 | """
292 | seld_metric = np.mean([
293 | sed_error[0],
294 | 1 - sed_error[1],
295 | doa_error[0]/180,
296 | 1 - doa_error[1]]
297 | )
298 | return seld_metric
299 |
--------------------------------------------------------------------------------
/parameter.py:
--------------------------------------------------------------------------------
1 | # Parameters used in the feature extraction, neural network model, and training the SELDnet can be changed here.
2 | #
3 | # Ideally, do not change the values of the default parameters. Create separate cases with unique as seen in
4 | # the code below (if-else loop) and use them. This way you can easily reproduce a configuration on a later time.
5 |
6 |
7 | def get_params(argv='1'):
8 | print("SET: {}".format(argv))
9 | # ########### default parameters ##############
10 | params = dict(
11 | quick_test=False, # To do quick test. Trains/test on small subset of dataset, and # of epochs
12 |
13 | # INPUT PATH
14 | dataset_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\base_folder', # Base folder containing the foa/mic and metadata folders
15 | #dataset_dir='/content/gdrive/My Drive/DCASE2020-Task3/base_folder',
16 |
17 | # OUTPUT PATH
18 | feat_label_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\input_feature\\baseline_log_mel', # Directory to dump extracted features and labels
19 | #feat_label_dir='/content/gdrive/My Drive/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel',
20 | model_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\models', # Dumps the trained models and training curves in this folder
21 | dcase_output=True, # If true, dumps the results recording-wise in 'dcase_dir' path.
22 | # Set this true after you have finalized your model, save the output, and submit
23 | dcase_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\results', # Dumps the recording-wise network output in this folder
24 |
25 | # DATASET LOADING PARAMETERS
26 | mode='eval', # 'dev' - development or 'eval' - evaluation dataset
27 | dataset='mic', # 'foa' - ambisonic or 'mic' - microphone signals
28 |
29 | #FEATURE PARAMS
30 | fs=24000,
31 | hop_len_s=0.02,
32 | label_hop_len_s=0.1,
33 | max_audio_len_s=60,
34 | nb_mel_bins=64,
35 |
36 | #AUDIO REPRESENTATION TYPE (+)
37 | is_gammatone=False, # if set to True, extracts gammatone representation instead of Log-Mel
38 | fmin=.0,
39 |
40 | # DNN MODEL PARAMETERS
41 | label_sequence_length=60, # Feature sequence length
42 | batch_size=64, # Batch size
43 | dropout_rate=0, # Dropout rate, constant for all layers
44 | nb_cnn2d_filt=64, # Number of CNN nodes, constant for each layer
45 | f_pool_size=[4, 4, 2], # CNN frequency pooling, length of list = number of CNN layers, list value = pooling per layer
46 |
47 | # CNN squeeze-excitation parameter (+)
48 | do_baseline=False,
49 | ratio=16,
50 |
51 | # Get dataset
52 | folder='normalized',
53 |
54 | rnn_size=[128, 128], # RNN contents, length of list = number of layers, list value = number of nodes
55 | fnn_size=[128], # FNN contents, length of list = number of layers, list value = number of nodes
56 | loss_weights=[1., 1000.], # [sed, doa] weight for scaling the DNN outputs
57 | nb_epochs=50, # Train for maximum epochs
58 | epochs_per_fit=5, # Number of epochs per fit
59 | doa_objective='masked_mse', # supports: mse, masked_mse. mse- original seld approach; masked_mse - dcase 2020 approach
60 |
61 | #METRIC PARAMETERS
62 | lad_doa_thresh=20
63 |
64 | )
65 | feature_label_resolution = int(params['label_hop_len_s'] // params['hop_len_s'])
66 | params['feature_sequence_length'] = params['label_sequence_length'] * feature_label_resolution
67 | params['t_pool_size'] = [feature_label_resolution, 1, 1] # CNN time pooling
68 | params['patience'] = int(params['nb_epochs']) # Stop training if patience is reached
69 |
70 | params['unique_classes'] = {
71 | 'alarm': 0,
72 | 'baby': 1,
73 | 'crash': 2,
74 | 'dog': 3,
75 | 'engine': 4,
76 | 'female_scream': 5,
77 | 'female_speech': 6,
78 | 'fire': 7,
79 | 'footsteps': 8,
80 | 'knock': 9,
81 | 'male_scream': 10,
82 | 'male_speech': 11,
83 | 'phone': 12,
84 | 'piano': 13
85 | }
86 |
87 |
88 | # ########### User defined parameters ##############
89 | # if argv == '1':
90 | # print("USING DEFAULT PARAMETERS\n")
91 |
92 | # elif argv == '2':
93 | # params['mode'] = 'dev'
94 | # params['dataset'] = 'mic'
95 |
96 | # elif argv == '3':
97 | # params['mode'] = 'eval'
98 | # params['dataset'] = 'mic'
99 |
100 | # elif argv == '4':
101 | # params['mode'] = 'dev'
102 | # params['dataset'] = 'foa'
103 |
104 | # elif argv == '5':
105 | # params['mode'] = 'eval'
106 | # params['dataset'] = 'foa'
107 |
108 | # elif argv == '999':
109 | # print("QUICK TEST MODE\n")
110 | # params['quick_test'] = True
111 | # params['epochs_per_fit'] = 1
112 |
113 | # else:
114 | # print('ERROR: unknown argument {}'.format(argv))
115 | # exit()
116 |
117 | for key, value in params.items():
118 | print("\t{}: {}".format(key, value))
119 | return params
120 |
--------------------------------------------------------------------------------
/visualize_SELD_output.py:
--------------------------------------------------------------------------------
1 | # Script for visualising the SELD output.
2 | #
3 | # NOTE: Make sure to use the appropriate backend for the matplotlib based on your OS
4 |
5 | import os
6 | import numpy as np
7 | import librosa.display
8 | import cls_feature_class
9 | import parameter
10 | import matplotlib.gridspec as gridspec
11 | import matplotlib.pyplot as plot
12 | plot.switch_backend('agg')
13 | plot.rcParams.update({'font.size': 22})
14 |
15 |
16 | def collect_classwise_data(_in_dict):
17 | _out_dict = {}
18 | for _key in _in_dict.keys():
19 | for _seld in _in_dict[_key]:
20 | if _seld[0] not in _out_dict:
21 | _out_dict[_seld[0]] = []
22 | _out_dict[_seld[0]].append([_key, _seld[0], _seld[1], _seld[2]])
23 | return _out_dict
24 |
25 |
26 | def plot_func(plot_data, hop_len_s, ind, plot_x_ax=False, plot_y_ax=False):
27 | cmap = ['b', 'r', 'g', 'y', 'k', 'c', 'm', 'b', 'r', 'g', 'y', 'k', 'c', 'm']
28 | for class_ind in plot_data.keys():
29 | time_ax = np.array(plot_data[class_ind])[:, 0] *hop_len_s
30 | y_ax = np.array(plot_data[class_ind])[:, ind]
31 | plot.plot(time_ax, y_ax, marker='.', color=cmap[class_ind], linestyle='None', markersize=4)
32 | plot.grid()
33 | plot.xlim([0, 60])
34 | if not plot_x_ax:
35 | plot.gca().axes.set_xticklabels([])
36 |
37 | if not plot_y_ax:
38 | plot.gca().axes.set_yticklabels([])
39 | # --------------------------------- MAIN SCRIPT STARTS HERE -----------------------------------------
40 | params = parameter.get_params()
41 |
42 | # output format file to visualize
43 | pred = os.path.join(params['dcase_dir'], '2_mic_dev/fold1_room1_mix006_ov1.csv')
44 |
45 | # path of reference audio directory for visualizing the spectrogram and description directory for
46 | # visualizing the reference
47 | # Note: The code finds out the audio filename from the predicted filename automatically
48 | ref_dir = os.path.join(params['dataset_dir'], 'metadata_dev')
49 | aud_dir = os.path.join(params['dataset_dir'], 'mic_dev')
50 |
51 | # load the predicted output format
52 | feat_cls = cls_feature_class.FeatureClass(params)
53 | pred_dict = feat_cls.load_output_format_file(pred)
54 | pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict)
55 |
56 | # load the reference output format
57 | ref_filename = os.path.basename(pred)
58 | ref_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_dir, ref_filename))
59 |
60 | pred_data = collect_classwise_data(pred_dict_polar)
61 | ref_data = collect_classwise_data(ref_dict_polar)
62 |
63 | nb_classes = len(feat_cls.get_classes())
64 |
65 | # load the audio and extract spectrogram
66 | ref_filename = os.path.basename(pred).replace('.csv', '.wav')
67 | audio, fs = feat_cls._load_audio(os.path.join(aud_dir, ref_filename))
68 | stft = np.abs(np.squeeze(feat_cls._spectrogram(audio[:, :1])))
69 | stft = librosa.amplitude_to_db(stft, ref=np.max)
70 |
71 | plot.figure(figsize=(20, 15))
72 | gs = gridspec.GridSpec(4, 4)
73 | ax0 = plot.subplot(gs[0, 1:3]), librosa.display.specshow(stft.T, sr=fs, x_axis='s', y_axis='linear'), plot.xlim([0, 60]), plot.xticks([]), plot.xlabel(''), plot.title('Spectrogram')
74 | ax1 = plot.subplot(gs[1, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=1, plot_y_ax=True), plot.ylim([-1, nb_classes + 1]), plot.title('SED reference')
75 | ax2 = plot.subplot(gs[1, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=1), plot.ylim([-1, nb_classes + 1]), plot.title('SED predicted')
76 | ax3 = plot.subplot(gs[2, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=2, plot_y_ax=True), plot.ylim([-180, 180]), plot.title('Azimuth reference')
77 | ax4 = plot.subplot(gs[2, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=2), plot.ylim([-180, 180]), plot.title('Azimuth predicted')
78 | ax5 = plot.subplot(gs[3, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=3, plot_y_ax=True), plot.ylim([-90, 90]), plot.title('Elevation reference')
79 | ax6 = plot.subplot(gs[3, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=3), plot.ylim([-90, 90]), plot.title('Elevation predicted')
80 | ax_lst = [ax0, ax1, ax2, ax3, ax4, ax5, ax6]
81 | plot.savefig(os.path.join(params['dcase_dir'] , ref_filename.replace('.wav', '.jpg')), dpi=300, bbox_inches = "tight")
82 |
83 |
84 |
--------------------------------------------------------------------------------