├── .gitignore ├── LICENSE.md ├── README.md ├── batch_feature_extraction.py ├── calculate_dev_results_from_dcase_output.py ├── cls_data_generator.py ├── cls_feature_class.py ├── display_specs.py ├── fold1_room1_mix050_ov2.png ├── gammatone ├── COPYING ├── README.md ├── auditory_toolkit │ ├── COPYING │ ├── ERBFilterBank.m │ ├── ERBSpace.m │ ├── MakeERBFilters.m │ ├── README │ ├── demo_gammatone.m │ ├── fft2gammatonemx.m │ ├── gammatone_demo.m │ ├── gammatonegram.m │ └── specgram.m ├── doc │ ├── FurElise.png │ ├── Makefile │ ├── conf.py │ ├── details.rst │ ├── fftweight.rst │ ├── filters.rst │ ├── gtgram.rst │ ├── index.rst │ ├── make.bat │ └── plot.rst ├── gammatone │ ├── __init__.py │ ├── __main__.py │ ├── fftweight.py │ ├── filters.py │ ├── gtgram.py │ └── plot.py ├── setup.py ├── test_generation │ ├── README │ ├── test_ERBFilterBank.m │ ├── test_ERBSpace.m │ ├── test_MakeERBFilters.m │ ├── test_fft2gammatonemx.m │ ├── test_fft_gammatonegram.m │ ├── test_gammatonegram.m │ └── test_specgram.m └── tests │ ├── __init__.py │ ├── data │ ├── test_erb_filter_data.mat │ ├── test_erbspace_data.mat │ ├── test_fft2gtmx_data.mat │ ├── test_fft_gammatonegram_data.mat │ ├── test_filterbank_data.mat │ ├── test_gammatonegram_data.mat │ └── test_specgram_data.mat │ ├── test_cfs.py │ ├── test_erb_space.py │ ├── test_fft_gtgram.py │ ├── test_fft_weights.py │ ├── test_filterbank.py │ ├── test_gammatone_filters.py │ ├── test_gammatonegram.py │ └── test_specgram.py ├── images ├── CRNN_SELDT_DCASE2020.png ├── SELDnet_output.jpg ├── scse_cropped.pdf ├── seld-squeeze-structure.pdf └── seld_squeeze_structure_image.jpg ├── keras_model.py ├── metrics ├── LICENSE.md ├── SELD_evaluation_metrics.py └── evaluation_metrics.py ├── parameter.py ├── seld.py └── visualize_SELD_output.py /.gitignore: -------------------------------------------------------------------------------- 1 | /home/jose/DCASE2020_Task3/base_folder/* 2 | /home/jose/DCASE2020_Task3/input_feature/* 3 | base_folder/* 4 | input_feature/* 5 | .vscode/* 6 | __pycache__/* 7 | metrics/__pycache__/* 8 | gammatone/gammatone/__pycache__/* 9 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------ 2 | Copyright (c) 2020 Tampere University and its licensors 3 | All rights reserved. 4 | 5 | Permission is hereby granted, without written agreement and without 6 | license or royalty fees, to use and copy the code for the Sound Event 7 | Localization and Detection using Convolutional Recurrent Neural Network 8 | method/architecture, present in the GitHub repository with the handle 9 | seld-dcase2020, (“Work”) described in the paper with title "Sound event 10 | localization and detection of overlapping sources using 11 | convolutional recurrent neural network" and composed of files with 12 | code in the Python programming language. This grant is only for experimental and 13 | non-commercial purposes, provided that the copyright notice in its entirety 14 | appear in all copies of this Work, and the original source of this Work, 15 | Audio Research Group at Tampere University, is acknowledged in any publication 16 | that reports research using this Work. 17 | 18 | Any commercial use of the Work or any part thereof is strictly prohibited. 19 | Commercial use include, but is not limited to: 20 | - selling or reproducing the Work 21 | - selling or distributing the results or content achieved by use of the Work 22 | - providing services by using the Work. 23 | 24 | IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO 25 | ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES 26 | ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE 27 | UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY 28 | OF SUCH DAMAGE. 29 | 30 | TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS 31 | ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 32 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER 33 | IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION 34 | TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 35 | 36 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------ 37 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # DCASE 2020: SELD using squeeze-excitation residual networks 3 | [Please visit the official webpage of the DCASE 2020 Challenge for comparison with other submissions](http://dcase.community/challenge2020/task-sound-event-localization-and-detection-results). 4 | 5 | The main objective of this submission was to study how squeeze-excitation techniques can improve the behavior of sound event detection and localization (SELD) systems. To do so, we start from the network presented as a baseline consisting of a CRNN and replace the convolutional layers by Conv-StandardPOST blocks. This block was presented in: 6 | 7 | > Naranjo-Alcazar, J., Perez-Castanos, S., Zuccarello, P., & Cobos, M. (2020). Acoustic Scene Classification with Squeeze-Excitation Residual Networks. IEEE Access. 8 | 9 | This repo implementation is presented in: 10 | 11 | > Naranjo-Alcazar, Javier, et al. "Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs." arXiv preprint arXiv:2006.14436 (2020). 12 | 13 | Please consider citing these works if the code or something presented in them has been used. 14 | 15 | ## BASELINE METHOD 16 | 17 | In comparison to the SELDnet studied in the papers above, we have changed the following to improve its performance and evaluate the performance better. 18 | * **Features**: The original SELDnet employed naive phase and magnitude components of the spectrogram as the input feature for all input formats of audio. In this baseline method, we use separate features for first-order Ambisonic (FOA) and microphone array (MIC) datasets. As the interaural level difference feature, we employ the 64-band mel energies extracted from each channel of the input audio for both FOA and MIC. To encode the interaural time difference features, we employ intensity vector features for FOA, and generalized cross correlation features for MIC. 19 | * **Loss/Objective**: The original SELDnet employed mean square error (MSE) for the DOA loss estimation, and this was computed irrespecitve of the presence or absence of the sound event. In the current baseline, we used a masked-MSE, which computes MSE only when the sound event is active in the reference. 20 | * **Evaluation metrics**: The performance of the original SELDnet was evaluated with stand-alone metrics for detection, and localization. Mainly because there was no suitable metric which could jointly evaluate the performance of localization and detection. Since then, we have proposed a new metric that can jointly evaluate the performance (more about it is described in the metrics section below), and we employ this new metric for evaluation here. 21 | 22 | The final SELDnet architecture is as shown below. The input is the multichannel audio, from which the different acoustic features are extracted based on the input format of the audio. Based on the chosen dataset (FOA or MIC), the baseline method takes a sequence of consecutive feature-frames and predicts all the active sound event classes for each of the input frame along with their respective spatial location, producing the temporal activity and DOA trajectory for each sound event class. In particular, a convolutional recurrent neural network (CRNN) is used to map the frame sequence to the two outputs in parallel. At the first output, SED is performed as a multi-label multi-class classification task, allowing the network to simultaneously estimate the presence of multiple sound events for each frame. At the second output, DOA estimates in the continuous 3D space are obtained as a multi-output regression task, where each sound event class is associated with three regressors that estimate the Cartesian coordinates x, y and z axes of the DOA on a unit sphere around the microphone. 23 | 24 |

25 | 26 |

27 | 28 | The SED output of the network is in the continuous range of [0 1] for each sound event in the dataset, and this value is thresholded to obtain a binary decision for the respective sound event activity. Finally, the respective DOA estimates for these active sound event classes provide their spatial locations. 29 | 30 | ## SUBMISSION MODIFICATION 31 | 32 | This image shows the submission architecture: 33 | 34 |

35 | 36 |

37 | 38 | 39 | 40 | ## DATASET 41 | 42 | The dataset used has been: 43 | 44 | * **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array** 45 | 46 | **TAU-NIGENS Spatial Sound Events 2020 - Microphone Array** provides four-channel directional microphone recordings from a tetrahedral array configuration. This format is extracted from the same microphone array, and additional information on the spatial characteristics of each format can be found below. This dataset consists of a development and evaluation set. The development set consists of 600, one minute long recordings sampled at 24000 Hz. We use 400 recordings for training split (fold 3 to 6), 100 for validation (fold 2) and 100 for testing (fold 1). The evaluation set consists of 200, one-minute recordings, and will be released at a later point. 47 | 48 | More details on the recording procedure and dataset can be read on the [DCASE 2020 task webpage](http://dcase.community/challenge2020/task-sound-event-localization-and-detection). 49 | 50 | The two development datasets can be downloaded from the link - [**TAU-NIGENS Spatial Sound Events 2020 - Ambisonic and Microphone Array**, Development dataset](https://doi.org/10.5281/zenodo.3740236) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3740236.svg)](https://doi.org/10.5281/zenodo.3740236) 51 | 52 | 53 | ## Getting Started 54 | 55 | This repository consists of multiple Python scripts forming one big architecture used to train the SELDnet. 56 | * The `batch_feature_extraction.py` is a standalone wrapper script, that extracts the features, labels, and normalizes the training and test split features for a given dataset. Make sure you update the location of the downloaded datasets before. 57 | * The `parameter.py` script consists of all the training, model, and feature parameters. If a user has to change some parameters, they have to create a sub-task with unique id here. Check code for examples. 58 | * The `cls_feature_class.py` script has routines for labels creation, features extraction and normalization. 59 | * The `cls_data_generator.py` script provides feature + label data in generator mode for training. 60 | * The `keras_model.py` script implements the SELDnet architecture. 61 | * The `evaluation_metrics.py` script implements the core metrics from sound event detection evaluation module http://tut-arg.github.io/sed_eval/ and the DOA metrics explained in the paper. These were used in the DCASE 2019 SELD task. We use this here to just for legacy comparison 62 | * The `SELD_evaluation_metrics.py` script implements the metrics for joint evaluation of detection and localization. 63 | * The `seld.py` is a wrapper script that trains the SELDnet. The training stops when the SELD error (check paper) stops improving. 64 | 65 | Additionally, we also provide supporting scripts that help analyse the results. 66 | * `visualize_SELD_output.py` script to visualize the SELDnet output 67 | 68 | 69 | ### Prerequisites 70 | 71 | The provided codebase has been tested on python 3.6.9/3.7.3 and Keras 2.2.4/2.3.1 72 | 73 | 74 | ### Training the SELDnet 75 | 76 | In order to quickly train SELDnet follow the steps below. 77 | 78 | * For the chosen dataset (Ambisonic or Microphone), download the respective zip file. This contains both the audio files and the respective metadata. Unzip the files under the same 'base_folder/', ie, if you are Ambisonic dataset, then the 'base_folder/' should have two folders - 'foa_dev/' and 'metadata_dev/' after unzipping. 79 | 80 | * Now update the respective dataset name and its path in `parameter.py` script. For the above example, you will change `dataset='foa'` and `dataset_dir='base_folder/'`. Also provide a directory path `feat_label_dir` in the same `parameter.py` script where all the features and labels will be dumped. 81 | 82 | * Extract features from the downloaded dataset by running the `batch_feature_extraction.py` script. Run the script as shown below. This will dump the normalized features and labels in the `feat_label_dir` folder. 83 | 84 | ``` 85 | python3 batch_feature_extraction.py 86 | ``` 87 | 88 | You can now train the SELDnet using this subimssion modifications. Parameters that MUST be indicated are --baseline and --ratio 89 | ```python 90 | python3 seld.py --baseline False --ratio 4 91 | ``` 92 | 93 | executes ConvStandard modules with ratio =4. If you want to execute the baseline code, set --baseline to True. If want to execute residual learning without squeeze-excitation: 94 | 95 | ```python 96 | python3 seld.py --baseline False --ratio 0 97 | ``` 98 | 99 | 100 | * By default, the code runs in `quick_test = False` mode. Setting `quick_test = True` in `parameter.py` trains the network for 2 epochs on only 2 mini-batches. 101 | 102 | * The code also plots training curves, intermediate results and saves models in the `model_dir` path provided by the user in `parameter.py` file. 103 | 104 | * In order to visualize the output of SELDnet and for submission of results, set `dcase_output=True` and provide `dcase_dir` directory. This will dump file-wise results in the directory, which can be individually visualized using `visualize_SELD_output.py` script. 105 | 106 | ## Results on development dataset (baseline) 107 | 108 | As the evaluation metrics we use two different approaches as discussed in our recent paper below 109 | 110 | > Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, and Tuomas Virtanen. Joint measurement of localization and detection of sound events. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, Oct 2019. 111 | 112 | The first metric is more focused on the detection part, also referred as the location-aware detection, which gives us the error rate (ER) and F-score (F) in one-second non-overlapping segments. We consider the prediction to be correct if the prediction and reference class are the same, and the distance between them is below 20°. 113 | The second metric is more focused on the localization part, also referred as the class-aware localization, which gives us the DOA error (DE), and F-score (DE_F) in one-second non-overlapping segments. Unlike the location-aware detection, we do not use any distance threshold, but estimate the distance between the correct prediction and reference. 114 | 115 | The evaluation metric scores for the test split of the development dataset is given below 116 | 117 | ### Baseline results 118 | 119 | | Dataset | ER | F | DE | DE_F | 120 | | ----| --- | --- | --- | --- | 121 | | Microphone Array (MIC) | 0.78 | 31.4 % | 27.3° | 59.0 % | 122 | 123 | 124 | **Note:** The reported baseline system performance is not exactly reproducible due to varying setups. However, you should be able to obtain very similar results. 125 | 126 | ## Submission results 127 | 128 | ### Development stage (results on testing folder) 129 | 130 | set mode to dev 131 | 132 | | ratio | ER | F | DE | DE_F | 133 | | ----| --- | --- | --- | --- | 134 | | 0 | 0.68 | 42.3 | 22.5 | 65.1 | 135 | | 1 | 0.70 | 39.2 | 23.5 | 63.6 | 136 | | 2 | 0.69 | 40.4 | 23.2 | 62.1 | 137 | | 4 | 0.68 | 40.9 | 23.3 | 65.0 | 138 | | 8 | 0.69 | 40.8 | 23.5 | 63.8 | 139 | | 16 | 0.69 | 40.7 | 23.3 | 62.8 140 | 141 | ### Challenge results 142 | 143 | * The team submission ranked 11/15 144 | 145 | * Best system ranked, ratio = 1, 30/43 146 | 147 | | ratio | ER | F | DE | DE_F | 148 | | :----:| --- | --- | --- | --- | 149 | | organization baseline | 0.70 | 39.5 | 23.2 | 62.1 | 150 | | 0 | 0.61 | 48.3 | 19.2 | 65.9 | 151 | | 1 | 0.61 | 49.1 | 19.5 | 67.1 | 152 | | 8 | 0.64 | 46.7 | 20.0 | 64.5 | 153 | | 16 | 0.63 | 47.3 | 19.5 | 65.5 | 154 | 155 | 156 | 157 | -------------------------------------------------------------------------------- /batch_feature_extraction.py: -------------------------------------------------------------------------------- 1 | # Extracts the features, labels, and normalizes the development and evaluation split features. 2 | 3 | import cls_feature_class 4 | import parameter 5 | 6 | process_str = 'eval' #, eval' # 'dev' or 'eval' will extract features for the respective set accordingly 7 | # 'dev, eval' will extract features of both sets together 8 | 9 | params = parameter.get_params() 10 | 11 | 12 | if 'dev' in process_str: 13 | # -------------- Extract features and labels for development set ----------------------------- 14 | dev_feat_cls = cls_feature_class.FeatureClass(params, is_eval=False) 15 | 16 | # Extract features and normalize them 17 | dev_feat_cls.extract_all_feature() 18 | dev_feat_cls.preprocess_features() 19 | 20 | # # Extract labels in regression mode 21 | dev_feat_cls.extract_all_labels() 22 | 23 | 24 | if 'eval' in process_str: 25 | # -----------------------------Extract ONLY features for evaluation set----------------------------- 26 | eval_feat_cls = cls_feature_class.FeatureClass(params, is_eval=True) 27 | 28 | # Extract features and normalize them 29 | eval_feat_cls.extract_all_feature() 30 | eval_feat_cls.preprocess_features() 31 | 32 | -------------------------------------------------------------------------------- /calculate_dev_results_from_dcase_output.py: -------------------------------------------------------------------------------- 1 | import os 2 | from metrics import SELD_evaluation_metrics 3 | import cls_feature_class 4 | import parameter 5 | import numpy as np 6 | 7 | 8 | def get_nb_files(_pred_file_list, _group='split'): 9 | _group_ind = {'ir': 4, 'ov': 21} 10 | _cnt_dict = {} 11 | for _filename in _pred_file_list: 12 | 13 | if _group == 'all': 14 | _ind = 0 15 | else: 16 | _ind = int(_filename[_group_ind[_group]]) 17 | 18 | if _ind not in _cnt_dict: 19 | _cnt_dict[_ind] = [] 20 | _cnt_dict[_ind].append(_filename) 21 | 22 | return _cnt_dict 23 | 24 | 25 | # --------------------------- MAIN SCRIPT STARTS HERE ------------------------------------------- 26 | 27 | 28 | # INPUT DIRECTORY 29 | ref_desc_files = '/scratch/asignal/sharath/DCASE2020_SELD_dataset/metadata_dev' # reference description directory location 30 | pred_output_format_files = 'results/2_mic_dev' # predicted output format directory location 31 | use_polar_format = True # Compute SELD metrics using polar or Cartesian coordinates 32 | 33 | # Load feature class 34 | params = parameter.get_params() 35 | feat_cls = cls_feature_class.FeatureClass(params) 36 | 37 | # collect reference files info 38 | ref_files = os.listdir(ref_desc_files) 39 | nb_ref_files = len(ref_files) 40 | 41 | # collect predicted files info 42 | pred_files = os.listdir(pred_output_format_files) 43 | nb_pred_files = len(pred_files) 44 | 45 | # Calculate scores for different splits, overlapping sound events, and impulse responses (reverberant scenes) 46 | score_type_list = ['all', 'ov', 'ir'] 47 | print('Number of predicted files: {}\nNumber of reference files: {}'.format(nb_pred_files, nb_ref_files)) 48 | print('\nCalculating {} scores for {}'.format(score_type_list, os.path.basename(pred_output_format_files))) 49 | 50 | for score_type in score_type_list: 51 | print('\n\n---------------------------------------------------------------------------------------------------') 52 | print('------------------------------------ {} ---------------------------------------------'.format('Total score' if score_type=='all' else 'score per {}'.format(score_type))) 53 | print('---------------------------------------------------------------------------------------------------') 54 | 55 | split_cnt_dict = get_nb_files(pred_files, _group=score_type) # collect files corresponding to score_type 56 | # Calculate scores across files for a given score_type 57 | for split_key in np.sort(list(split_cnt_dict)): 58 | # Load evaluation metric class 59 | eval = SELD_evaluation_metrics.SELDMetrics(nb_classes=feat_cls.get_nb_classes(), doa_threshold=params['lad_doa_thresh']) 60 | for pred_cnt, pred_file in enumerate(split_cnt_dict[split_key]): 61 | # Load predicted output format file 62 | pred_dict = feat_cls.load_output_format_file(os.path.join(pred_output_format_files, pred_file)) 63 | if use_polar_format: 64 | pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict) 65 | pred_labels = feat_cls.segment_labels(pred_dict_polar, feat_cls.get_nb_frames()) 66 | else: 67 | pred_labels = feat_cls.segment_labels(pred_dict, feat_cls.get_nb_frames()) 68 | 69 | # Load reference description file 70 | gt_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_desc_files, pred_file.replace('.npy', '.csv'))) 71 | if use_polar_format: 72 | gt_labels = feat_cls.segment_labels(gt_dict_polar, feat_cls.get_nb_frames()) 73 | else: 74 | gt_dict = feat_cls.convert_output_format_polar_to_cartesian(gt_dict_polar) 75 | gt_labels = feat_cls.segment_labels(gt_dict, feat_cls.get_nb_frames()) 76 | 77 | # Calculated scores 78 | if use_polar_format: 79 | eval.update_seld_scores(pred_labels, gt_labels) 80 | else: 81 | eval.update_seld_scores_xyz(pred_labels, gt_labels) 82 | 83 | 84 | # Overall SED and DOA scores 85 | er, f, de, de_f = eval.compute_seld_scores() 86 | seld_scr = SELD_evaluation_metrics.early_stopping_metric([er, f], [de, de_f]) 87 | 88 | print('\nAverage score for {} {} data using {} coordinates'.format(score_type, 'fold' if score_type=='all' else split_key, 'Polar' if use_polar_format else 'Cartesian' )) 89 | print('SELD score (early stopping metric): {:0.2f}'.format(seld_scr)) 90 | print('SED metrics: Error rate: {:0.2f}, F-score:{:0.1f}'.format(er, 100*f)) 91 | print('DOA metrics: DOA error: {:0.1f}, F-score:{:0.1f}'.format(de, 100*de_f)) 92 | -------------------------------------------------------------------------------- /cls_data_generator.py: -------------------------------------------------------------------------------- 1 | # 2 | # Data generator for training the SELDnet 3 | # 4 | 5 | import os 6 | import numpy as np 7 | import cls_feature_class 8 | from IPython import embed 9 | from collections import deque 10 | import random 11 | 12 | 13 | class DataGenerator(object): 14 | def __init__( 15 | self, params, split=1, shuffle=True, per_file=False, is_eval=False 16 | ): 17 | self._per_file = per_file 18 | self._is_eval = is_eval 19 | self._splits = np.array(split) 20 | self._batch_size = params['batch_size'] 21 | self._feature_seq_len = params['feature_sequence_length'] 22 | self._label_seq_len = params['label_sequence_length'] 23 | self._shuffle = shuffle 24 | self._feat_cls = cls_feature_class.FeatureClass(params=params, is_eval=self._is_eval) 25 | self._label_dir = self._feat_cls.get_label_dir() 26 | self._feat_dir = self._feat_cls.get_normalized_feat_dir() 27 | 28 | self._filenames_list = list() 29 | self._nb_frames_file = 0 # Using a fixed number of frames in feat files. Updated in _get_label_filenames_sizes() 30 | self._nb_mel_bins = self._feat_cls.get_nb_mel_bins() 31 | self._nb_ch = None 32 | self._label_len = None # total length of label - DOA + SED 33 | self._doa_len = None # DOA label length 34 | self._class_dict = self._feat_cls.get_classes() 35 | self._nb_classes = self._feat_cls.get_nb_classes() 36 | self._get_filenames_list_and_feat_label_sizes() 37 | 38 | self._feature_batch_seq_len = self._batch_size*self._feature_seq_len 39 | self._label_batch_seq_len = self._batch_size*self._label_seq_len 40 | self._circ_buf_feat = None 41 | self._circ_buf_label = None 42 | 43 | if self._per_file: 44 | self._nb_total_batches = len(self._filenames_list) 45 | else: 46 | self._nb_total_batches = int(np.floor((len(self._filenames_list) * self._nb_frames_file / 47 | float(self._feature_batch_seq_len)))) 48 | 49 | # self._dummy_feat_vec = np.ones(self._feat_len.shape) * 50 | 51 | print( 52 | '\tDatagen_mode: {}, nb_files: {}, nb_classes:{}\n' 53 | '\tnb_frames_file: {}, feat_len: {}, nb_ch: {}, label_len:{}\n'.format( 54 | 'eval' if self._is_eval else 'dev', len(self._filenames_list), self._nb_classes, 55 | self._nb_frames_file, self._nb_mel_bins, self._nb_ch, self._label_len 56 | ) 57 | ) 58 | 59 | print( 60 | '\tDataset: {}, split: {}\n' 61 | '\tbatch_size: {}, feat_seq_len: {}, label_seq_len: {}, shuffle: {}\n' 62 | '\tTotal batches in dataset: {}\n' 63 | '\tlabel_dir: {}\n ' 64 | '\tfeat_dir: {}\n'.format( 65 | params['dataset'], split, 66 | self._batch_size, self._feature_seq_len, self._label_seq_len, self._shuffle, 67 | self._nb_total_batches, 68 | self._label_dir, self._feat_dir 69 | ) 70 | ) 71 | 72 | def get_data_sizes(self): 73 | feat_shape = (self._batch_size, self._nb_ch, self._feature_seq_len, self._nb_mel_bins) 74 | if self._is_eval: 75 | label_shape = None 76 | else: 77 | label_shape = [ 78 | (self._batch_size, self._label_seq_len, self._nb_classes), 79 | (self._batch_size, self._label_seq_len, self._nb_classes*3) 80 | ] 81 | return feat_shape, label_shape 82 | 83 | def get_total_batches_in_data(self): 84 | return self._nb_total_batches 85 | 86 | def _get_filenames_list_and_feat_label_sizes(self): 87 | 88 | for filename in os.listdir(self._feat_dir): 89 | if self._is_eval: 90 | self._filenames_list.append(filename) 91 | else: 92 | if int(filename[4]) in self._splits: # check which split the file belongs to 93 | self._filenames_list.append(filename) 94 | 95 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[0])) 96 | self._nb_frames_file = temp_feat.shape[0] 97 | self._nb_ch = temp_feat.shape[1] // self._nb_mel_bins 98 | 99 | if not self._is_eval: 100 | temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[0])) 101 | self._label_len = temp_label.shape[-1] 102 | self._doa_len = (self._label_len - self._nb_classes)//self._nb_classes 103 | 104 | if self._per_file: 105 | self._batch_size = int(np.ceil(temp_feat.shape[0]/float(self._feature_seq_len))) 106 | 107 | return 108 | 109 | def generate(self): 110 | """ 111 | Generates batches of samples 112 | :return: 113 | """ 114 | 115 | while 1: 116 | if self._shuffle: 117 | random.shuffle(self._filenames_list) 118 | 119 | # Ideally this should have been outside the while loop. But while generating the test data we want the data 120 | # to be the same exactly for all epoch's hence we keep it here. 121 | self._circ_buf_feat = deque() 122 | self._circ_buf_label = deque() 123 | 124 | file_cnt = 0 125 | if self._is_eval: 126 | for i in range(self._nb_total_batches): 127 | # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the 128 | # circular buffer. If not keep refilling it. 129 | while len(self._circ_buf_feat) < self._feature_batch_seq_len: 130 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt])) 131 | 132 | for row_cnt, row in enumerate(temp_feat): 133 | self._circ_buf_feat.append(row) 134 | 135 | # If self._per_file is True, this returns the sequences belonging to a single audio recording 136 | if self._per_file: 137 | extra_frames = self._feature_batch_seq_len - temp_feat.shape[0] 138 | extra_feat = np.ones((extra_frames, temp_feat.shape[1])) * 1e-6 139 | 140 | for row_cnt, row in enumerate(extra_feat): 141 | self._circ_buf_feat.append(row) 142 | 143 | file_cnt = file_cnt + 1 144 | 145 | # Read one batch size from the circular buffer 146 | feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch)) 147 | for j in range(self._feature_batch_seq_len): 148 | feat[j, :] = self._circ_buf_feat.popleft() 149 | feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch)) 150 | 151 | # Split to sequences 152 | feat = self._split_in_seqs(feat, self._feature_seq_len) 153 | feat = np.transpose(feat, (0, 3, 1, 2)) 154 | 155 | yield feat 156 | 157 | else: 158 | for i in range(self._nb_total_batches): 159 | 160 | # load feat and label to circular buffer. Always maintain atleast one batch worth feat and label in the 161 | # circular buffer. If not keep refilling it. 162 | while len(self._circ_buf_feat) < self._feature_batch_seq_len: 163 | temp_feat = np.load(os.path.join(self._feat_dir, self._filenames_list[file_cnt])) 164 | temp_label = np.load(os.path.join(self._label_dir, self._filenames_list[file_cnt])) 165 | 166 | for f_row in temp_feat: 167 | self._circ_buf_feat.append(f_row) 168 | for l_row in temp_label: 169 | self._circ_buf_label.append(l_row) 170 | 171 | # If self._per_file is True, this returns the sequences belonging to a single audio recording 172 | if self._per_file: 173 | feat_extra_frames = self._feature_batch_seq_len - temp_feat.shape[0] 174 | extra_feat = np.ones((feat_extra_frames, temp_feat.shape[1])) * 1e-6 175 | 176 | label_extra_frames = self._label_batch_seq_len - temp_label.shape[0] 177 | extra_labels = np.zeros((label_extra_frames, temp_label.shape[1])) 178 | 179 | for f_row in extra_feat: 180 | self._circ_buf_feat.append(f_row) 181 | for l_row in extra_labels: 182 | self._circ_buf_label.append(l_row) 183 | 184 | file_cnt = file_cnt + 1 185 | 186 | # Read one batch size from the circular buffer 187 | feat = np.zeros((self._feature_batch_seq_len, self._nb_mel_bins * self._nb_ch)) 188 | label = np.zeros((self._label_batch_seq_len, self._label_len)) 189 | for j in range(self._feature_batch_seq_len): 190 | feat[j, :] = self._circ_buf_feat.popleft() 191 | for j in range(self._label_batch_seq_len): 192 | label[j, :] = self._circ_buf_label.popleft() 193 | feat = np.reshape(feat, (self._feature_batch_seq_len, self._nb_mel_bins, self._nb_ch)) 194 | 195 | # Split to sequences 196 | feat = self._split_in_seqs(feat, self._feature_seq_len) 197 | feat = np.transpose(feat, (0, 3, 1, 2)) 198 | label = self._split_in_seqs(label, self._label_seq_len) 199 | 200 | label = [ 201 | label[:, :, :self._nb_classes], # SED labels 202 | label # SED + DOA labels 203 | ] 204 | yield feat, label 205 | 206 | def _split_in_seqs(self, data, _seq_len): 207 | if len(data.shape) == 1: 208 | if data.shape[0] % _seq_len: 209 | data = data[:-(data.shape[0] % _seq_len), :] 210 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, 1)) 211 | elif len(data.shape) == 2: 212 | if data.shape[0] % _seq_len: 213 | data = data[:-(data.shape[0] % _seq_len), :] 214 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1])) 215 | elif len(data.shape) == 3: 216 | if data.shape[0] % _seq_len: 217 | data = data[:-(data.shape[0] % _seq_len), :, :] 218 | data = data.reshape((data.shape[0] // _seq_len, _seq_len, data.shape[1], data.shape[2])) 219 | else: 220 | print('ERROR: Unknown data dimensions: {}'.format(data.shape)) 221 | exit() 222 | return data 223 | 224 | @staticmethod 225 | def split_multi_channels(data, num_channels): 226 | tmp = None 227 | in_shape = data.shape 228 | if len(in_shape) == 3: 229 | hop = in_shape[2] / num_channels 230 | tmp = np.zeros((in_shape[0], num_channels, in_shape[1], hop)) 231 | for i in range(num_channels): 232 | tmp[:, i, :, :] = data[:, :, i * hop:(i + 1) * hop] 233 | elif len(in_shape) == 4 and num_channels == 1: 234 | tmp = np.zeros((in_shape[0], 1, in_shape[1], in_shape[2], in_shape[3])) 235 | tmp[:, 0, :, :, :] = data 236 | else: 237 | print('ERROR: The input should be a 3D matrix but it seems to have dimensions: {}'.format(in_shape)) 238 | exit() 239 | return tmp 240 | 241 | def get_default_elevation(self): 242 | return self._default_ele 243 | 244 | def get_azi_ele_list(self): 245 | return self._feat_cls.get_azi_ele_list() 246 | 247 | def get_nb_classes(self): 248 | return self._nb_classes 249 | 250 | def nb_frames_1s(self): 251 | return self._feat_cls.nb_frames_1s() 252 | 253 | def get_hop_len_sec(self): 254 | return self._feat_cls.get_hop_len_sec() 255 | 256 | def get_classes(self): 257 | return self._feat_cls.get_classes() 258 | 259 | def get_filelist(self): 260 | return self._filenames_list 261 | 262 | def get_frame_per_file(self): 263 | return self._label_batch_seq_len 264 | 265 | def get_nb_frames(self): 266 | return self._feat_cls.get_nb_frames() 267 | 268 | def get_data_gen_mode(self): 269 | return self._is_eval 270 | 271 | def write_output_format_file(self, _out_file, _out_dict): 272 | return self._feat_cls.write_output_format_file(_out_file, _out_dict) -------------------------------------------------------------------------------- /display_specs.py: -------------------------------------------------------------------------------- 1 | import librosa.display 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | 5 | 6 | 7 | gamma = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy') 8 | 9 | gamma_ch1 = gamma[:,0:64] 10 | 11 | plt.subplot(2, 2, 1) 12 | gamma_ch1 = gamma_ch1.T 13 | librosa.display.specshow(np.flip(gamma_ch1,1)) 14 | plt.colorbar() 15 | plt.title('fold1_room1_mix001_ov1 gammatone scale to max') 16 | 17 | gamma_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel/mic_dev/fold1_room1_mix001_ov1.npy') 18 | 19 | gamma_norm_ch1 = gamma_norm[:,0:64] 20 | #gamma_norm_ch1 = gamma_norm_ch1.T 21 | plt.subplot(2, 2, 2) 22 | librosa.display.specshow(gamma_norm_ch1.T) 23 | plt.colorbar() 24 | plt.title('fold1_room1_mix001_ov1 gammatone no scale to max') 25 | 26 | spec = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev/fold1_room1_mix001_ov1.npy') 27 | 28 | spec_ch1 = spec[:,0:64] 29 | 30 | plt.subplot(2, 2, 3) 31 | #spec_ch1 = spec_ch1.T 32 | librosa.display.specshow(spec_ch1.T) 33 | plt.colorbar() 34 | plt.title('fold1_room1_mix001_ov1 mel spectrogram') 35 | 36 | spec_norm = np.load('/home/javier/repos/DCASE2020-Task3/input_feature/baseline_log_mel/mic_dev_norm/fold1_room1_mix001_ov1.npy') 37 | 38 | spec_norm_ch1 = spec_norm[:,0:64] 39 | 40 | plt.subplot(2, 2, 4) 41 | librosa.display.specshow(spec_norm_ch1.T) 42 | plt.colorbar() 43 | plt.title('fold1_room1_mix001_ov1 mel norm spectrogram') 44 | 45 | plt.show() -------------------------------------------------------------------------------- /fold1_room1_mix050_ov2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/fold1_room1_mix050_ov2.png -------------------------------------------------------------------------------- /gammatone/COPYING: -------------------------------------------------------------------------------- 1 | Copyright (c) 1998, Malcolm Slaney 2 | Copyright (c) 2009, Dan Ellis 3 | Copyright (c) 2014, Jason Heeris 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | * Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | * Redistributions in binary form must reproduce the above copyright 11 | notice, this list of conditions and the following disclaimer in the 12 | documentation and/or other materials provided with the distribution. 13 | * Neither the name of the copyright holder nor the names of its contributors 14 | may be used to endorse or promote products derived from this software 15 | without specific prior written permission. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 18 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 19 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 20 | DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY 21 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 22 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 23 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 24 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 25 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | -------------------------------------------------------------------------------- /gammatone/README.md: -------------------------------------------------------------------------------- 1 | Gammatone Filterbank Toolkit 2 | ============================ 3 | 4 | *Utilities for analysing sound using perceptual models of human hearing.* 5 | 6 | Jason Heeris, 2013 7 | 8 | Summary 9 | ------- 10 | 11 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank MATLAB 12 | code, detailed below, to Python 2 and 3 using Numpy and Scipy. It analyses signals by 13 | running them through banks of gammatone filters, similar to Fourier-based 14 | spectrogram analysis. 15 | 16 | ![Gammatone-based spectrogram of Für Elise](doc/FurElise.png) 17 | 18 | Installation 19 | ------------ 20 | 21 | You can install directly from this git repository using: 22 | 23 | ```text 24 | pip install git+https://github.com/detly/gammatone.git 25 | ``` 26 | 27 | ...or you can clone the git repository however you prefer, and do: 28 | 29 | ```text 30 | pip install . 31 | ``` 32 | 33 | ...or: 34 | 35 | ``` 36 | python setup.py install 37 | ``` 38 | 39 | ...from the cloned tree. 40 | 41 | ### Dependencies 42 | 43 | - numpy 44 | - scipy 45 | - nose 46 | - mock 47 | - matplotlib 48 | 49 | Using the Code 50 | -------------- 51 | 52 | See the [API documentation](http://detly.github.io/gammatone/). For a 53 | demonstration, find a `.wav` file (for example, 54 | [Für Elise](http://heeris.id.au/samples/FurElise.wav)) and run: 55 | 56 | ```text 57 | python -m gammatone FurElise.wav -d 10 58 | ``` 59 | 60 | ...to see a gammatone-gram of the first ten seconds of the track. If you've 61 | installed via `pip` or `setup.py install`, you should also be able to just run: 62 | 63 | ```text 64 | gammatone FurElise.wav -d 10 65 | ``` 66 | 67 | Basis 68 | ----- 69 | 70 | This project is based on research into how humans perceive audio, originally 71 | published by Malcolm Slaney: 72 | 73 | [Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report #1998-010, 74 | Interval Research Corporation, 1998.]( 75 | http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/ 76 | ) 77 | 78 | Slaney's report describes a way of modelling how the human ear perceives, 79 | emphasises and separates different frequencies of sound. A series of gammatone 80 | filters are constructed whose width increases with increasing centre frequency, 81 | and this bank of filters is applied to a time-domain signal. The result of this 82 | is a spectrum that should represent the human experience of sound better than, 83 | say, a Fourier-domain spectrum would. 84 | 85 | A gammatone filter has an impulse response that is a sine wave multiplied by a 86 | gamma distribution function. It is a common approach to modelling the auditory 87 | system. 88 | 89 | The gammatone filterbank approach can be considered analogous (but not 90 | equivalent) to a discrete Fourier transform where the frequency axis is 91 | logarithmic. For example, a series of notes spaced an octave apart would appear 92 | to be roughly linearly spaced; or a sound that was distributed across the same 93 | linear frequency range would appear to have more spread at lower frequencies. 94 | 95 | The real goal of this toolkit is to allow easy computation of the gammatone 96 | equivalent of a spectrogram — a time-varying spectrum of energy over audible 97 | frequencies based on a gammatone filterbank. 98 | 99 | Slaney demonstrated his research with an initial implementation in MATLAB. This 100 | implementation was later extended by Dan Ellis, who found a way to approximate a 101 | "gammatone-gram" by using the fast Fourier transform. Ellis' code calculates a 102 | matrix of weights that can be applied to the output of a FFT so that a 103 | Fourier-based spectrogram can easily be transformed into such an approximation. 104 | 105 | Ellis' code and documentation is here: [Gammatone-like spectrograms]( 106 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/ 107 | ) 108 | 109 | Interest 110 | -------- 111 | 112 | I became interested in this because of my background in science communication 113 | and my general interest in the teaching of signal processing. I find that the 114 | spectrogram approach to visualising signals is adequate for illustrating 115 | abstract systems or the mathematical properties of transforms, but bears little 116 | correspondence to a person's own experience of sound. If someone wants to see 117 | what their favourite piece of music "looks like," a normal Fourier transform 118 | based spectrogram is actually quite a poor way to visualise it. Features of the 119 | audio seem to be oddly spaced or unnaturally emphasised or de-emphasised 120 | depending on where they are in the frequency domain. 121 | 122 | The gammatone filterbank approach seems to be closer to what someone might 123 | intuitively expect a visualisation of sound to look like, and can help develop 124 | an intuition about alternative representations of signals. 125 | 126 | Verifying the port 127 | ------------------ 128 | 129 | Since this is a port of existing MATLAB code, I've written tests to verify the 130 | Python implementation against the original code. These tests aren't unit tests, 131 | but they do generally test single functions. Running the tests has the same 132 | workflow: 133 | 134 | 1. Run the scripts in the `test_generation` directory. This will create a 135 | `.mat` file containing test data in `tests/data`. 136 | 137 | 2. Run `nosetest3` in the top level directory. This will find and run all the 138 | tests in the `tests` directory. 139 | 140 | Although I'm usually loathe to check in generated files to version control, I'm 141 | willing to make an exception for the `.mat` files containing the test data. My 142 | reasoning is that they represent the decoupling of my code from the MATLAB code, 143 | and if the two projects were separated, they would be considered a part of the 144 | Python code, not the original MATLAB code. 145 | 146 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/COPYING: -------------------------------------------------------------------------------- 1 | Copyright (c) 1998, Malcolm Slaney 2 | Copyright (c) 2009, Dan Ellis 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are met: 7 | * Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | * Redistributions in binary form must reproduce the above copyright 10 | notice, this list of conditions and the following disclaimer in the 11 | documentation and/or other materials provided with the distribution. 12 | * Neither the name of the copyright holder nor the names of its contributors 13 | may be used to endorse or promote products derived from this software 14 | without specific prior written permission. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY 20 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 23 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/ERBFilterBank.m: -------------------------------------------------------------------------------- 1 | function output = ERBFilterBank(x, fcoefs) 2 | % function output = ERBFilterBank(x, fcoefs) 3 | % Process an input waveform with a gammatone filter bank. This function 4 | % takes a single sound vector, and returns an array of filter outputs, one 5 | % channel per row. 6 | % 7 | % The fcoefs parameter, which completely specifies the Gammatone filterbank, 8 | % should be designed with the MakeERBFilters function. If it is omitted, 9 | % the filter coefficients are computed for you assuming a 22050Hz sampling 10 | % rate and 64 filters regularly spaced on an ERB scale from fs/2 down to 100Hz. 11 | % 12 | 13 | % Malcolm Slaney @ Interval, June 11, 1998. 14 | % (c) 1998 Interval Research Corporation 15 | % Thanks to Alain de Cheveigne' for his suggestions and improvements. 16 | 17 | if nargin < 1 18 | error('Syntax: output_array = ERBFilterBank(input_vector[, fcoefs]);'); 19 | end 20 | 21 | if nargin < 2 22 | fcoefs = MakeERBFilters(22050,64,100); 23 | end 24 | 25 | if size(fcoefs,2) ~= 10 26 | error('fcoefs parameter passed to ERBFilterBank is the wrong size.'); 27 | end 28 | 29 | if size(x,2) < size(x,1) 30 | x = x'; 31 | end 32 | 33 | A0 = fcoefs(:,1); 34 | A11 = fcoefs(:,2); 35 | A12 = fcoefs(:,3); 36 | A13 = fcoefs(:,4); 37 | A14 = fcoefs(:,5); 38 | A2 = fcoefs(:,6); 39 | B0 = fcoefs(:,7); 40 | B1 = fcoefs(:,8); 41 | B2 = fcoefs(:,9); 42 | gain= fcoefs(:,10); 43 | 44 | output = zeros(size(gain,1), length(x)); 45 | for chan = 1: size(gain,1) 46 | y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ... 47 | A2(chan)/gain(chan)], ... 48 | [B0(chan) B1(chan) B2(chan)], x); 49 | y2=filter([A0(chan) A12(chan) A2(chan)], ... 50 | [B0(chan) B1(chan) B2(chan)], y1); 51 | y3=filter([A0(chan) A13(chan) A2(chan)], ... 52 | [B0(chan) B1(chan) B2(chan)], y2); 53 | y4=filter([A0(chan) A14(chan) A2(chan)], ... 54 | [B0(chan) B1(chan) B2(chan)], y3); 55 | output(chan, :) = y4; 56 | end 57 | 58 | if 0 59 | semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(output)))); 60 | end 61 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/ERBSpace.m: -------------------------------------------------------------------------------- 1 | function cfArray = ERBSpace(lowFreq, highFreq, N) 2 | % function cfArray = ERBSpace(lowFreq, highFreq, N) 3 | % This function computes an array of N frequencies uniformly spaced between 4 | % highFreq and lowFreq on an ERB scale. N is set to 100 if not specified. 5 | % 6 | % See also linspace, logspace, MakeERBCoeffs, MakeERBFilters. 7 | % 8 | % For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983). 9 | % "Suggested formulae for calculating auditory-filter bandwidths and 10 | % excitation patterns," J. Acoust. Soc. Am. 74, 750-753. 11 | 12 | if nargin < 1 13 | lowFreq = 100; 14 | end 15 | 16 | if nargin < 2 17 | highFreq = 44100/4; 18 | end 19 | 20 | if nargin < 3 21 | N = 100; 22 | end 23 | 24 | % Change the following three parameters if you wish to use a different 25 | % ERB scale. Must change in MakeERBCoeffs too. 26 | EarQ = 9.26449; % Glasberg and Moore Parameters 27 | minBW = 24.7; 28 | order = 1; 29 | 30 | % All of the followFreqing expressions are derived in Apple TR #35, "An 31 | % Efficient Implementation of the Patterson-Holdsworth Cochlear 32 | % Filter Bank." See pages 33-34. 33 | cfArray = -(EarQ*minBW) + exp((1:N)'*(-log(highFreq + EarQ*minBW) + ... 34 | log(lowFreq + EarQ*minBW))/N) * (highFreq + EarQ*minBW); 35 | 36 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/MakeERBFilters.m: -------------------------------------------------------------------------------- 1 | function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq) 2 | % function [fcoefs,cf]=MakeERBFilters(fs,numChannels,lowFreq) 3 | % This function computes the filter coefficients for a bank of 4 | % Gammatone filters. These filters were defined by Patterson and 5 | % Holdworth for simulating the cochlea. 6 | % 7 | % The result is returned as an array of filter coefficients. Each row 8 | % of the filter arrays contains the coefficients for four second order 9 | % filters. The transfer function for these four filters share the same 10 | % denominator (poles) but have different numerators (zeros). All of these 11 | % coefficients are assembled into one vector that the ERBFilterBank 12 | % can take apart to implement the filter. 13 | % 14 | % The filter bank contains "numChannels" channels that extend from 15 | % half the sampling rate (fs) to "lowFreq". Alternatively, if the numChannels 16 | % input argument is a vector, then the values of this vector are taken to 17 | % be the center frequency of each desired filter. (The lowFreq argument is 18 | % ignored in this case.) 19 | 20 | % Note this implementation fixes a problem in the original code by 21 | % computing four separate second order filters. This avoids a big 22 | % problem with round off errors in cases of very small cfs (100Hz) and 23 | % large sample rates (44kHz). The problem is caused by roundoff error 24 | % when a number of poles are combined, all very close to the unit 25 | % circle. Small errors in the eigth order coefficient, are multiplied 26 | % when the eigth root is taken to give the pole location. These small 27 | % errors lead to poles outside the unit circle and instability. Thanks 28 | % to Julius Smith for leading me to the proper explanation. 29 | 30 | % Execute the following code to evaluate the frequency 31 | % response of a 10 channel filterbank. 32 | % fcoefs = MakeERBFilters(16000,10,100); 33 | % y = ERBFilterBank([1 zeros(1,511)], fcoefs); 34 | % resp = 20*log10(abs(fft(y'))); 35 | % freqScale = (0:511)/512*16000; 36 | % semilogx(freqScale(1:255),resp(1:255,:)); 37 | % axis([100 16000 -60 0]) 38 | % xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)'); 39 | 40 | % Rewritten by Malcolm Slaney@Interval. June 11, 1998. 41 | % (c) 1998 Interval Research Corporation 42 | 43 | T = 1/fs; 44 | if length(numChannels) == 1 45 | cf = ERBSpace(lowFreq, fs/2, numChannels); 46 | else 47 | cf = numChannels(1:end); 48 | if size(cf,2) > size(cf,1) 49 | cf = cf'; 50 | end 51 | end 52 | 53 | % Change the followFreqing three parameters if you wish to use a different 54 | % ERB scale. Must change in ERBSpace too. 55 | EarQ = 9.26449; % Glasberg and Moore Parameters 56 | minBW = 24.7; 57 | order = 1; 58 | 59 | ERB = ((cf/EarQ).^order + minBW^order).^(1/order); 60 | B=1.019*2*pi*ERB; 61 | 62 | A0 = T; 63 | A2 = 0; 64 | B0 = 1; 65 | B1 = -2*cos(2*cf*pi*T)./exp(B*T); 66 | B2 = exp(-2*B*T); 67 | 68 | A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ... 69 | exp(B*T))/2; 70 | A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2*cf*pi*T)./ ... 71 | exp(B*T))/2; 72 | A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ... 73 | exp(B*T))/2; 74 | A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2*cf*pi*T)./ ... 75 | exp(B*T))/2; 76 | 77 | gain = abs((-2*exp(4*i*cf*pi*T)*T + ... 78 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ... 79 | (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ... 80 | sin(2*cf*pi*T))) .* ... 81 | (-2*exp(4*i*cf*pi*T)*T + ... 82 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ... 83 | (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ... 84 | sin(2*cf*pi*T))).* ... 85 | (-2*exp(4*i*cf*pi*T)*T + ... 86 | 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ... 87 | (cos(2*cf*pi*T) - ... 88 | sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ... 89 | (-2*exp(4*i*cf*pi*T)*T + 2*exp(-(B*T) + 2*i*cf*pi*T).*T.* ... 90 | (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ... 91 | (-2 ./ exp(2*B*T) - 2*exp(4*i*cf*pi*T) + ... 92 | 2*(1 + exp(4*i*cf*pi*T))./exp(B*T)).^4); 93 | 94 | allfilts = ones(length(cf),1); 95 | fcoefs = [A0*allfilts A11 A12 A13 A14 A2*allfilts B0*allfilts B1 B2 gain]; 96 | 97 | if (0) % Test Code 98 | A0 = fcoefs(:,1); 99 | A11 = fcoefs(:,2); 100 | A12 = fcoefs(:,3); 101 | A13 = fcoefs(:,4); 102 | A14 = fcoefs(:,5); 103 | A2 = fcoefs(:,6); 104 | B0 = fcoefs(:,7); 105 | B1 = fcoefs(:,8); 106 | B2 = fcoefs(:,9); 107 | gain= fcoefs(:,10); 108 | chan=1; 109 | x = [1 zeros(1, 511)]; 110 | y1=filter([A0(chan)/gain(chan) A11(chan)/gain(chan) ... 111 | A2(chan)/gain(chan)],[B0(chan) B1(chan) B2(chan)], x); 112 | y2=filter([A0(chan) A12(chan) A2(chan)], ... 113 | [B0(chan) B1(chan) B2(chan)], y1); 114 | y3=filter([A0(chan) A13(chan) A2(chan)], ... 115 | [B0(chan) B1(chan) B2(chan)], y2); 116 | y4=filter([A0(chan) A14(chan) A2(chan)], ... 117 | [B0(chan) B1(chan) B2(chan)], y3); 118 | semilogx((0:(length(x)-1))*(fs/length(x)),20*log10(abs(fft(y4)))); 119 | end 120 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/README: -------------------------------------------------------------------------------- 1 | These files are the original auditory toolkit/gammatone filterbank code created 2 | by Malcolm Slaney and Dan Ellis, published at: 3 | 4 | http://labrosa.ee.columbia.edu/matlab/gammatonegram/ 5 | https://engineering.purdue.edu/~malcolm/interval/1998-010/ 6 | 7 | Any non-code assets (ie. the sample WAV file and associated graphs) have been 8 | removed. 9 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/demo_gammatone.m: -------------------------------------------------------------------------------- 1 | %% Gammatone-like spectrograms 2 | % Gammatone filters are a popular linear approximation to the 3 | % filtering performed by the ear. This routine provides a simple 4 | % wrapper for generating time-frequency surfaces based on a 5 | % gammatone analysis, which can be used as a replacement for a 6 | % conventional spectrogram. It also provides a fast approximation 7 | % to this surface based on weighting the output of a conventional 8 | % FFT. 9 | 10 | %% Introduction 11 | % It is very natural to visualize sound as a time-varying 12 | % distribution of energy in frequency - not least because this is 13 | % one way of describing the information our brains get from our 14 | % ears via the auditory nerve. The spectrogram is the traditional 15 | % time-frequency visualization, but it actually has some important 16 | % differences from how sound is analyzed by the ear, most 17 | % significantly that the ear's frequency subbands get wider for 18 | % higher frequencies, whereas the spectrogram has a constant 19 | % bandwidth across all frequency channels. 20 | % 21 | % There have been many signal-processing approximations proposed 22 | % for the frequency analysis performed by the ear; one of the most 23 | % popular is the Gammatone filterbank originally proposed by 24 | % Roy Patterson and colleagues in 1992. Gammatone filters were 25 | % conceived as a simple fit to experimental observations of 26 | % the mammalian cochlea, and have a repeated pole structure leading 27 | % to an impulse response that is the product of a Gamma envelope 28 | % g(t) = t^n e^{-t} and a sinusoid (tone). 29 | % 30 | % One reason for the popularity of this approach is the 31 | % availability of an implementation by Malcolm Slaney, as 32 | % described in: 33 | % 34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2", 35 | % Technical Report #1998-010, Interval Research Corporation, 1998. 36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/ 37 | % 38 | % Malcolm's toolbox includes routines to design a Gammatone 39 | % filterbank and to process a signal by every filter in a bank, 40 | % but in order to convert this into a time-frequency visualization 41 | % it is necessary to sum up the energy within regular time bins. 42 | % While this is not complicated, the function here provides a 43 | % convenient wrapper to achieve this final step, for applications 44 | % that are content to work with time-frequency magnitude 45 | % distributions instead of going down to the waveform levels. In 46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters 47 | % and ERBFilterBank routines. 48 | % 49 | % This is, however, quite a computationally expensive approach, so 50 | % we also provide an alternative algorithm that gives very similar 51 | % results. In this mode, the Gammatone-based spectrogram is 52 | % constructed by first calculating a conventional, fixed-bandwidth 53 | % spectrogram, then combining the fine frequency resolution of the 54 | % FFT-based spectra into the coarser, smoother Gammatone responses 55 | % via a weighting function. This calculates the time-frequency 56 | % distribution some 30-40x faster than the full approach. 57 | 58 | %% Routines 59 | % The code consists of a main routine, , 60 | % which takes a waveform and other parameters and returns a 61 | % spectrogram-like time-frequency matrix, and a helper function 62 | % , which constructs the 63 | % weighting matrix to convert FFT output spectra into gammatone 64 | % approximations. 65 | 66 | %% Example usage 67 | % First, we calculate a Gammatone-based spectrogram-like image of 68 | % a speech waveform using the fast approximation. Then we do the 69 | % same thing using the full filtering approach, for comparison. 70 | 71 | % Load a waveform, calculate its gammatone spectrogram, then display: 72 | [d,sr] = wavread('sa2.wav'); 73 | tic; [D,F] = gammatonegram(d,sr); toc 74 | %Elapsed time is 0.140742 seconds. 75 | subplot(211) 76 | imagesc(20*log10(D)); axis xy 77 | caxis([-90 -30]) 78 | colorbar 79 | % F returns the center frequencies of each band; 80 | % display whichever elements were shown by the autoscaling 81 | set(gca,'YTickLabel',round(F(get(gca,'YTick')))); 82 | ylabel('freq / Hz'); 83 | xlabel('time / 10 ms steps'); 84 | title('Gammatonegram - fast method') 85 | 86 | % Now repeat with flag to use actual subband filters. 87 | % Since it's the last argument, we have to include all the other 88 | % arguments. These are the default values for: summation window 89 | % (0.025 sec), hop between successive windows (0.010 sec), 90 | % number of gammatone channels (64), lowest frequency (50 Hz), 91 | % and highest frequency (sr/2). The last argument as zero 92 | % means not to use the FFT approach. 93 | tic; [D2,F2] = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc 94 | %Elapsed time is 3.165083 seconds. 95 | subplot(212) 96 | imagesc(20*log10(D2)); axis xy 97 | caxis([-90 -30]) 98 | colorbar 99 | set(gca,'YTickLabel',round(F(get(gca,'YTick')))); 100 | ylabel('freq / Hz'); 101 | xlabel('time / 10 ms steps'); 102 | title('Gammatonegram - accurate method') 103 | % Actual gammatone filters appear somewhat narrower. The fast 104 | % version assumes coherence of addition of amplitude from 105 | % different channels, whereas the actual subband energies will 106 | % depend on how the energy in different frequencies combines. 107 | % Also notice the visible time smearing in the low frequency 108 | % channels that does not occur in the fast version. 109 | 110 | %% Validation 111 | % We can check the frequency responses of the filterbank 112 | % simulated with the fast method against the actual filters 113 | % from Malcolm's toolbox. They match very closely, but of 114 | % course this still doesn't mean the two approaches will give 115 | % identical results - because the fast method ignores the phase 116 | % of each frequency channel when summing up. 117 | 118 | % Check the frequency responses to see that they match: 119 | % Put an impulse through the Slaney ERB filters, then take the 120 | % frequency response of each impulse response. 121 | fcfs = flipud(MakeERBFilters(16000,64,50)); 122 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs); 123 | H = zeros(64,512); 124 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end 125 | % The weighting matrix for the FFT is the frequency response 126 | % of each output filter 127 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512); 128 | % Plot every 5th channel from both. Offset by 3 dB just so we can 129 | % see both 130 | fs = [0:511]/512*8000; 131 | figure 132 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r') 133 | axis([0 8000 -150 0]) 134 | grid 135 | % Line up pretty well, apart from wiggles below -100 dB 136 | % (from truncating the impulse response at 1000 samples?) 137 | 138 | %% Download 139 | % You can download all the code and data for these examples here: 140 | % . 141 | 142 | %% Referencing 143 | % If you use this work in a publication, I would be grateful 144 | % if you referenced this page as follows: 145 | % 146 | % D. P. W. Ellis (2009). "Gammatone-like spectrograms", web resource. 147 | % http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/ 148 | 149 | %% Acknowledgment 150 | % This project was supported in part by the NSF under 151 | % grant IIS-0535168. Any opinions, findings and conclusions 152 | % or recommendations expressed in this material are those of the 153 | % authors and do not necessarily reflect the views of the Sponsors. 154 | 155 | % Last updated: $Date: 2009/07/07 14:14:11 $ 156 | % Dan Ellis 157 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/fft2gammatonemx.m: -------------------------------------------------------------------------------- 1 | function [wts,gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen) 2 | % wts = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen) 3 | % Generate a matrix of weights to combine FFT bins into 4 | % Gammatone bins. nfft defines the source FFT size at 5 | % sampling rate sr. Optional nfilts specifies the number of 6 | % output bands required (default 64), and width is the 7 | % constant width of each band in Bark (default 1). 8 | % minfreq, maxfreq specify range covered in Hz (100, sr/2). 9 | % While wts has nfft columns, the second half are all zero. 10 | % Hence, aud spectrum is 11 | % fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft)); 12 | % maxlen truncates the rows to this many bins 13 | % 14 | % 2004-09-05 Dan Ellis dpwe@ee.columbia.edu based on rastamat/audspec.m 15 | % Last updated: $Date: 2009/02/22 02:29:25 $ 16 | 17 | if nargin < 2; sr = 16000; end 18 | if nargin < 3; nfilts = 64; end 19 | if nargin < 4; width = 1.0; end 20 | if nargin < 5; minfreq = 100; end 21 | if nargin < 6; maxfreq = sr/2; end 22 | if nargin < 7; maxlen = nfft; end 23 | 24 | wts = zeros(nfilts, nfft); 25 | 26 | % after Slaney's MakeERBFilters 27 | EarQ = 9.26449; 28 | minBW = 24.7; 29 | order = 1; 30 | 31 | cfreqs = -(EarQ*minBW) + exp((1:nfilts)'*(-log(maxfreq + EarQ*minBW) + ... 32 | log(minfreq + EarQ*minBW))/nfilts) * (maxfreq + EarQ*minBW); 33 | cfreqs = flipud(cfreqs); 34 | 35 | GTord = 4; 36 | 37 | ucirc = exp(j*2*pi*[0:(nfft/2)]/nfft); 38 | 39 | justpoles = 0; 40 | 41 | for i = 1:nfilts 42 | cf = cfreqs(i); 43 | ERB = width*((cf/EarQ).^order + minBW^order).^(1/order); 44 | B = 1.019*2*pi*ERB; 45 | r = exp(-B/sr); 46 | theta = 2*pi*cf/sr; 47 | pole = r*exp(j*theta); 48 | 49 | if justpoles == 1 50 | % point on unit circle of maximum gain, from differentiating magnitude 51 | cosomegamax = (1+r*r)/(2*r)*cos(theta); 52 | if abs(cosomegamax) > 1 53 | if theta < pi/2; omegamax = 0; 54 | else omegamax = pi; end 55 | else 56 | omegamax = acos(cosomegamax); 57 | end 58 | center = exp(j*omegamax); 59 | gain = abs((pole-center).*(pole'-center)).^GTord; 60 | wts(i,1:(nfft/2+1)) = gain * (abs((pole-ucirc).*(pole'- ... 61 | ucirc)).^-GTord); 62 | else 63 | % poles and zeros, following Malcolm's MakeERBFilter 64 | T = 1/sr; 65 | A11 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3+2^1.5)*T*sin(2* ... 66 | cf*pi*T)./exp(B*T))/2; 67 | A12 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3+2^1.5)*T*sin(2* ... 68 | cf*pi*T)./exp(B*T))/2; 69 | A13 = -(2*T*cos(2*cf*pi*T)./exp(B*T) + 2*sqrt(3-2^1.5)*T*sin(2* ... 70 | cf*pi*T)./exp(B*T))/2; 71 | A14 = -(2*T*cos(2*cf*pi*T)./exp(B*T) - 2*sqrt(3-2^1.5)*T*sin(2* ... 72 | cf*pi*T)./exp(B*T))/2; 73 | zros = -[A11 A12 A13 A14]/T; 74 | 75 | gain(i) = abs((-2*exp(4*j*cf*pi*T)*T + ... 76 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ... 77 | (cos(2*cf*pi*T) - sqrt(3 - 2^(3/2))* ... 78 | sin(2*cf*pi*T))) .* ... 79 | (-2*exp(4*j*cf*pi*T)*T + ... 80 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ... 81 | (cos(2*cf*pi*T) + sqrt(3 - 2^(3/2)) * ... 82 | sin(2*cf*pi*T))).* ... 83 | (-2*exp(4*j*cf*pi*T)*T + ... 84 | 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ... 85 | (cos(2*cf*pi*T) - ... 86 | sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) .* ... 87 | (-2*exp(4*j*cf*pi*T)*T + 2*exp(-(B*T) + 2*j*cf*pi*T).*T.* ... 88 | (cos(2*cf*pi*T) + sqrt(3 + 2^(3/2))*sin(2*cf*pi*T))) ./ ... 89 | (-2 ./ exp(2*B*T) - 2*exp(4*j*cf*pi*T) + ... 90 | 2*(1 + exp(4*j*cf*pi*T))./exp(B*T)).^4); 91 | wts(i,1:(nfft/2+1)) = ((T^4)/gain(i)) ... 92 | * abs(ucirc-zros(1)).*abs(ucirc-zros(2))... 93 | .*abs(ucirc-zros(3)).*abs(ucirc-zros(4))... 94 | .*(abs((pole-ucirc).*(pole'-ucirc)).^-GTord); 95 | end 96 | end 97 | 98 | wts = wts(:,1:maxlen); 99 | 100 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/gammatone_demo.m: -------------------------------------------------------------------------------- 1 | %% Gammatone-like spectrograms 2 | % Gammatone filters are a popular linear approximation to the 3 | % filtering performed by the ear. This routine provides a simple 4 | % wrapper for generating time-frequency surfaces based on a 5 | % gammatone analysis, which can be used as a replacement for a 6 | % conventional spectrogram. It also provides a fast approximation 7 | % to this surface based on weighting the output of a conventional 8 | % FFT. 9 | 10 | %% Introduction 11 | % It is very natural to visualize sound as a time-varying 12 | % distribution of energy in frequency - not least because this is 13 | % one way of describing the information our brains get from our 14 | % ears via the auditory nerve. The spectrogram is the traditional 15 | % time-frequency visualization, but it actually has some important 16 | % differences from how sound is analyzed by the ear, most 17 | % significantly that the ear's frequency subbands get wider for 18 | % higher frequencies, whereas the spectrogram has a constant 19 | % bandwidth across all frequency channels. 20 | % 21 | % There have been many signal-processing approximations proposed 22 | % for the frequency analysis performed by the ear; one of the most 23 | % popular is the Gammatone filterbank originally proposed by 24 | % Roy Patterson and colleagues in 1992. Gammatone filters were 25 | % conceived as a simple fit to experimental observations of 26 | % the mammalian cochlea, and have a repeated pole structure leading 27 | % to an impulse response that is the product of a Gamma envelope 28 | % g(t) = t^n e^{-t} and a sinusoid (tone). 29 | % 30 | % One reason for the popularity of this approach is the 31 | % availability of an implementation by Malcolm Slaney, as 32 | % described in: 33 | % 34 | % Malcolm Slaney (1998) "Auditory Toolbox Version 2", 35 | % Technical Report #1998-010, Interval Research Corporation, 1998. 36 | % http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/ 37 | % 38 | % Malcolm's toolbox includes routines to design a Gammatone 39 | % filterbank and to process a signal by every filter in a bank, 40 | % but in order to convert this into a time-frequency visualization 41 | % it is necessary to sum up the energy within regular time bins. 42 | % While this is not complicated, the function here provides a 43 | % convenient wrapper to achieve this final step, for applications 44 | % that are content to work with time-frequency magnitude 45 | % distributions instead of going down to the waveform levels. In 46 | % this mode of operation, the routine uses Malcolm's MakeERBFilters 47 | % and ERBFilterBank routines. 48 | % 49 | % This is, however, quite a computationally expensive approach, so 50 | % we also provide an alternative algorithm that gives very similar 51 | % results. In this mode, the Gammatone-based spectrogram is 52 | % constructed by first calculating a conventional, fixed-bandwidth 53 | % spectrogram, then combining the fine frequency resolution of the 54 | % FFT-based spectra into the coarser, smoother Gammatone responses 55 | % via a weighting function. This calculates the time-frequency 56 | % distribution some 30-40x faster than the full approach. 57 | 58 | %% Routines 59 | % The code consists of a main routine, , 60 | % which takes a waveform and other parameters and returns a 61 | % spectrogram-like time-frequency matrix, and a helper function 62 | % , which constructs the 63 | % weighting matrix to convert FFT output spectra into gammatone 64 | % approximations. 65 | 66 | %% Example usage 67 | % First, we calculate a Gammatone-based spectrogram-like image of 68 | % a speech waveform using the fast approximation. Then we do the 69 | % same thing using the full filtering approach, for comparison. 70 | 71 | % Load a waveform, calculate its gammatone spectrogram, then display: 72 | [d,sr] = wavread('sa2.wav'); 73 | tic; D = gammatonegram(d,sr); toc 74 | %Elapsed time is 0.140742 seconds. 75 | subplot(211) 76 | imagesc(20*log10(D)); axis xy 77 | caxis([-90 -30]) 78 | colorbar 79 | title('Gammatonegram - fast method') 80 | 81 | % Now repeat with flag to use actual subband filters. 82 | % Since it's the last argument, we have to include all the other 83 | % arguments. These are the default values for: summation window 84 | % (0.025 sec), hop between successive windows (0.010 sec), 85 | % number of gammatone channels (64), lowest frequency (50 Hz), 86 | % and highest frequency (sr/2). The last argument as zero 87 | % means not to use the FFT approach. 88 | tic; D2 = gammatonegram(d,sr,0.025,0.010,64,50,sr/2,0); toc 89 | %Elapsed time is 3.165083 seconds. 90 | subplot(212) 91 | imagesc(20*log10(D2)); axis xy 92 | caxis([-90 -30]) 93 | colorbar 94 | title('Gammatonegram - accurate method') 95 | % Actual gammatone filters appear somewhat narrower. The fast 96 | % version assumes coherence of addition of amplitude from 97 | % different channels, whereas the actual subband energies will 98 | % depend on how the energy in different frequencies combines. 99 | % Also notice the visible time smearing in the low frequency 100 | % channels that does not occur in the fast version. 101 | 102 | %% Validation 103 | % We can check the frequency responses of the filterbank 104 | % simulated with the fast method against the actual filters 105 | % from Malcolm's toolbox. They match very closely, but of 106 | % course this still doesn't mean the two approaches will give 107 | % identical results - because the fast method ignores the phase 108 | % of each frequency channel when summing up. 109 | 110 | % Check the frequency responses to see that they match: 111 | % Put an impulse through the Slaney ERB filters, then take the 112 | % frequency response of each impulse response. 113 | fcfs = flipud(MakeERBFilters(16000,64,50)); 114 | gtir = ERBFilterBank([1, zeros(1,1000)],fcfs); 115 | H = zeros(64,512); 116 | for i = 1:64; H(i,:) = abs(freqz(gtir(i,:),1,512)); end 117 | % The weighting matrix for the FFT is the frequency response 118 | % of each output filter 119 | gtm = fft2gammatonemx(1024,16000,64,1,50,8000,512); 120 | % Plot every 5th channel from both. Offset by 3 dB just so we can 121 | % see both 122 | fs = [0:511]/512*8000; 123 | figure 124 | plot(fs,20*log10(H(5:5:64,:))','b',fs, -3 + 20*log10(gtm(5:5:64,:))','r') 125 | axis([0 8000 -150 0]) 126 | grid 127 | % Line up pretty well, apart from wiggles below -100 dB 128 | % (from truncating the impulse response at 1000 samples?) 129 | 130 | %% Download 131 | % You can download all the code and data for these examples here: 132 | % . 133 | 134 | %% Referencing 135 | % If you use this work in a publication, I would be grateful 136 | % if you referenced this page as follows: 137 | % 138 | % D. P. W. Ellis (2009). "Gammatone-like spectrograms", web resource, http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram/ . 139 | 140 | %% Acknowledgment 141 | % This project was supported in part by the NSF under 142 | % grant IIS-0535168. Any opinions, findings and conclusions 143 | % or recommendations expressed in this material are those of the 144 | % authors and do not necessarily reflect the views of the Sponsors. 145 | 146 | % Last updated: $Date: 2009/02/22 01:46:42 $ 147 | % Dan Ellis 148 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/gammatonegram.m: -------------------------------------------------------------------------------- 1 | function [Y,F] = gammatonegram(X,SR,TWIN,THOP,N,FMIN,FMAX,USEFFT,WIDTH) 2 | % [Y,F] = gammatonegram(X,SR,N,TWIN,THOP,FMIN,FMAX,USEFFT,WIDTH) 3 | % Calculate a spectrogram-like time frequency magnitude array 4 | % based on Gammatone subband filters. Waveform X (at sample 5 | % rate SR) is passed through an N (default 64) channel gammatone 6 | % auditory model filterbank, with lowest frequency FMIN (50) 7 | % and highest frequency FMAX (SR/2). The outputs of each band 8 | % then have their energy integrated over windows of TWIN secs 9 | % (0.025), advancing by THOP secs (0.010) for successive 10 | % columns. These magnitudes are returned as an N-row 11 | % nonnegative real matrix, Y. 12 | % If USEFFT is present and zero, revert to actual filtering and 13 | % summing energy within windows. 14 | % WIDTH (default 1.0) is how to scale bandwidth of filters 15 | % relative to ERB default (for fast method only). 16 | % F returns the center frequencies in Hz of each row of Y 17 | % (uniformly spaced on a Bark scale). 18 | % 19 | % 2009-02-18 DAn Ellis dpwe@ee.columbia.edu 20 | % Last updated: $Date: 2009/02/23 21:07:09 $ 21 | 22 | if nargin < 2; SR = 16000; end 23 | if nargin < 3; TWIN = 0.025; end 24 | if nargin < 4; THOP = 0.010; end 25 | if nargin < 5; N = 64; end 26 | if nargin < 6; FMIN = 50; end 27 | if nargin < 7; FMAX = SR/2; end 28 | if nargin < 8; USEFFT = 1; end 29 | if nargin < 9; WIDTH = 1.0; end 30 | 31 | 32 | if USEFFT == 0 33 | 34 | % Use malcolm's function to filter into subbands 35 | %%%% IGNORES FMAX! ***** 36 | [fcoefs,F] = MakeERBFilters(SR, N, FMIN); 37 | fcoefs = flipud(fcoefs); 38 | 39 | XF = ERBFilterBank(X,fcoefs); 40 | 41 | nwin = round(TWIN*SR); 42 | % Always use rectangular window for now 43 | % if USEHANN == 1 44 | window = hann(nwin)'; 45 | % else 46 | % window = ones(1,nwin); 47 | % end 48 | % window = window/sum(window); 49 | % XE = [zeros(N,round(nwin/2)),XF.^2,zeros(N,round(nwin/2))]; 50 | XE = [XF.^2]; 51 | 52 | hopsamps = round(THOP*SR); 53 | 54 | ncols = 1 + floor((size(XE,2)-nwin)/hopsamps); 55 | 56 | Y = zeros(N,ncols); 57 | 58 | % winmx = repmat(window,N,1); 59 | 60 | for i = 1:ncols 61 | % Y(:,i) = sqrt(sum(winmx.*XE(:,(i-1)*hopsamps + [1:nwin]),2)); 62 | Y(:,i) = sqrt(mean(XE(:,(i-1)*hopsamps + [1:nwin]),2)); 63 | end 64 | 65 | else 66 | % USEFFT version 67 | % How long a window to use relative to the integration window requested 68 | winext = 1; 69 | twinmod = winext * TWIN; 70 | % first spectrogram 71 | nfft = 2^(ceil(log(2*twinmod*SR)/log(2))); 72 | nhop = round(THOP*SR); 73 | nwin = round(twinmod*SR); 74 | [gtm,F] = fft2gammatonemx(nfft, SR, N, WIDTH, FMIN, FMAX, nfft/2+1); 75 | % perform FFT and weighting in amplitude domain 76 | Y = 1/nfft*gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop)); 77 | % or the power domain? doesn't match nearly as well 78 | %Y = 1/nfft*sqrt(gtm*abs(specgram(X,nfft,SR,nwin,nwin-nhop).^2)); 79 | end 80 | 81 | 82 | 83 | -------------------------------------------------------------------------------- /gammatone/auditory_toolkit/specgram.m: -------------------------------------------------------------------------------- 1 | function y = specgram(x,n,sr,w,ov) 2 | % Y = myspecgram(X,NFFT,SR,W,OV) 3 | % Substitute for Matlab's specgram, calculates & displays spectrogram 4 | % $Header: /homes/dpwe/tmp/e6820/RCS/myspecgram.m,v 1.1 2002/08/04 19:20:27 dpwe Exp $ 5 | 6 | if (size(x,1) > size(x,2)) 7 | x = x'; 8 | end 9 | 10 | s = length(x); 11 | 12 | if nargin < 2 13 | n = 256; 14 | end 15 | if nargin < 3 16 | sr = 1; 17 | end 18 | if nargin < 4 19 | w = n; 20 | end 21 | if nargin < 5 22 | ov = w/2; 23 | end 24 | h = w - ov; 25 | 26 | halflen = w/2; 27 | halff = n/2; % midpoint of win 28 | acthalflen = min(halff, halflen); 29 | 30 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen)); 31 | win = zeros(1, n); 32 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen); 33 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen); 34 | 35 | c = 1; 36 | 37 | % pre-allocate output array 38 | ncols = 1+fix((s-n)/h); 39 | d = zeros((1+n/2), ncols); 40 | 41 | for b = 0:h:(s-n) 42 | u = win.*x((b+1):(b+n)); 43 | t = fft(u); 44 | d(:,c) = t([1:(1+n/2)]'); 45 | c = c+1; 46 | end; 47 | 48 | tt = [0:h:(s-n)]/sr; 49 | ff = [0:(n/2)]*sr/n; 50 | 51 | if nargout < 1 52 | imagesc(tt,ff,20*log10(abs(d))); 53 | axis xy 54 | xlabel('Time / s'); 55 | ylabel('Frequency / Hz'); 56 | else 57 | y = d; 58 | end 59 | -------------------------------------------------------------------------------- /gammatone/doc/FurElise.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/doc/FurElise.png -------------------------------------------------------------------------------- /gammatone/doc/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = _build 9 | 10 | # Internal variables. 11 | PAPEROPT_a4 = -D latex_paper_size=a4 12 | PAPEROPT_letter = -D latex_paper_size=letter 13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 14 | # the i18n builder cannot share the environment and doctrees with the others 15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 16 | 17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext 18 | 19 | help: 20 | @echo "Please use \`make ' where is one of" 21 | @echo " html to make standalone HTML files" 22 | @echo " dirhtml to make HTML files named index.html in directories" 23 | @echo " singlehtml to make a single large HTML file" 24 | @echo " pickle to make pickle files" 25 | @echo " json to make JSON files" 26 | @echo " htmlhelp to make HTML files and a HTML help project" 27 | @echo " qthelp to make HTML files and a qthelp project" 28 | @echo " devhelp to make HTML files and a Devhelp project" 29 | @echo " epub to make an epub" 30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 31 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 32 | @echo " text to make text files" 33 | @echo " man to make manual pages" 34 | @echo " texinfo to make Texinfo files" 35 | @echo " info to make Texinfo files and run them through makeinfo" 36 | @echo " gettext to make PO message catalogs" 37 | @echo " changes to make an overview of all changed/added/deprecated items" 38 | @echo " linkcheck to check all external links for integrity" 39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 40 | 41 | clean: 42 | -rm -rf $(BUILDDIR)/* 43 | 44 | html: 45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 46 | @echo 47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 48 | 49 | dirhtml: 50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 51 | @echo 52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 53 | 54 | singlehtml: 55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 56 | @echo 57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 58 | 59 | pickle: 60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 61 | @echo 62 | @echo "Build finished; now you can process the pickle files." 63 | 64 | json: 65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 66 | @echo 67 | @echo "Build finished; now you can process the JSON files." 68 | 69 | htmlhelp: 70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 71 | @echo 72 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 73 | ".hhp project file in $(BUILDDIR)/htmlhelp." 74 | 75 | qthelp: 76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 77 | @echo 78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/gammatone.qhcp" 81 | @echo "To view the help file:" 82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/gammatone.qhc" 83 | 84 | devhelp: 85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 86 | @echo 87 | @echo "Build finished." 88 | @echo "To view the help file:" 89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/gammatone" 90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/gammatone" 91 | @echo "# devhelp" 92 | 93 | epub: 94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 95 | @echo 96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 97 | 98 | latex: 99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 100 | @echo 101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 103 | "(use \`make latexpdf' here to do that automatically)." 104 | 105 | latexpdf: 106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 107 | @echo "Running LaTeX files through pdflatex..." 108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 110 | 111 | text: 112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 113 | @echo 114 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 115 | 116 | man: 117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 118 | @echo 119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 120 | 121 | texinfo: 122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 123 | @echo 124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 125 | @echo "Run \`make' in that directory to run these through makeinfo" \ 126 | "(use \`make info' here to do that automatically)." 127 | 128 | info: 129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 130 | @echo "Running Texinfo files through makeinfo..." 131 | make -C $(BUILDDIR)/texinfo info 132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 133 | 134 | gettext: 135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 136 | @echo 137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 138 | 139 | changes: 140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 141 | @echo 142 | @echo "The overview file is in $(BUILDDIR)/changes." 143 | 144 | linkcheck: 145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 146 | @echo 147 | @echo "Link check complete; look for any errors in the above output " \ 148 | "or in $(BUILDDIR)/linkcheck/output.txt." 149 | 150 | doctest: 151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 152 | @echo "Testing of doctests in the sources finished, look at the " \ 153 | "results in $(BUILDDIR)/doctest/output.txt." 154 | -------------------------------------------------------------------------------- /gammatone/doc/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # gammatone documentation build configuration file, created by 4 | # sphinx-quickstart on Sat Dec 8 23:21:49 2012. 5 | # 6 | # This file is execfile()d with the current directory set to its containing dir. 7 | # 8 | # Note that not all possible configuration values are present in this 9 | # autogenerated file. 10 | # 11 | # All configuration values have a default; values that are commented out 12 | # serve to show the default. 13 | 14 | import sys, os 15 | 16 | # If extensions (or modules to document with autodoc) are in another directory, 17 | # add these directories to sys.path here. If the directory is relative to the 18 | # documentation root, use os.path.abspath to make it absolute, like shown here. 19 | #sys.path.insert(0, os.path.abspath('.')) 20 | 21 | # -- General configuration ----------------------------------------------------- 22 | 23 | # If your documentation needs a minimal Sphinx version, state it here. 24 | #needs_sphinx = '1.0' 25 | 26 | # Add any Sphinx extension module names here, as strings. They can be extensions 27 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 28 | extensions = ['sphinx.ext.autodoc'] 29 | 30 | # Add any paths that contain templates here, relative to this directory. 31 | templates_path = ['_templates'] 32 | 33 | # The suffix of source filenames. 34 | source_suffix = '.rst' 35 | 36 | # The encoding of source files. 37 | #source_encoding = 'utf-8-sig' 38 | 39 | # The master toctree document. 40 | master_doc = 'index' 41 | 42 | # General information about the project. 43 | project = u'Gammatone Filterbank Toolkit' 44 | copyright = u'2014, Jason Heeris' 45 | 46 | # The version info for the project you're documenting, acts as replacement for 47 | # |version| and |release|, also used in various other places throughout the 48 | # built documents. 49 | # 50 | # The short X.Y version. 51 | version = '1.0' 52 | # The full version, including alpha/beta/rc tags. 53 | release = '1.0' 54 | 55 | # The language for content autogenerated by Sphinx. Refer to documentation 56 | # for a list of supported languages. 57 | #language = None 58 | 59 | # There are two options for replacing |today|: either, you set today to some 60 | # non-false value, then it is used: 61 | #today = '' 62 | # Else, today_fmt is used as the format for a strftime call. 63 | #today_fmt = '%B %d, %Y' 64 | 65 | # List of patterns, relative to source directory, that match files and 66 | # directories to ignore when looking for source files. 67 | exclude_patterns = ['_build'] 68 | 69 | # The reST default role (used for this markup: `text`) to use for all documents. 70 | #default_role = None 71 | 72 | # If true, '()' will be appended to :func: etc. cross-reference text. 73 | #add_function_parentheses = True 74 | 75 | # If true, the current module name will be prepended to all description 76 | # unit titles (such as .. function::). 77 | #add_module_names = True 78 | 79 | # If true, sectionauthor and moduleauthor directives will be shown in the 80 | # output. They are ignored by default. 81 | #show_authors = False 82 | 83 | # The name of the Pygments (syntax highlighting) style to use. 84 | pygments_style = 'sphinx' 85 | 86 | # A list of ignored prefixes for module index sorting. 87 | #modindex_common_prefix = [] 88 | 89 | 90 | # -- Options for HTML output --------------------------------------------------- 91 | 92 | # The theme to use for HTML and HTML Help pages. See the documentation for 93 | # a list of builtin themes. 94 | html_theme = 'haiku' 95 | 96 | # Theme options are theme-specific and customize the look and feel of a theme 97 | # further. For a list of options available for each theme, see the 98 | # documentation. 99 | #html_theme_options = {} 100 | 101 | # Add any paths that contain custom themes here, relative to this directory. 102 | #html_theme_path = [] 103 | 104 | # The name for this set of Sphinx documents. If None, it defaults to 105 | # " v documentation". 106 | html_title = u"%s %s" % (project, release) 107 | 108 | # A shorter title for the navigation bar. Default is the same as html_title. 109 | #html_short_title = None 110 | 111 | # The name of an image file (relative to this directory) to place at the top 112 | # of the sidebar. 113 | #html_logo = None 114 | 115 | # The name of an image file (within the static path) to use as favicon of the 116 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 117 | # pixels large. 118 | #html_favicon = None 119 | 120 | # Add any paths that contain custom static files (such as style sheets) here, 121 | # relative to this directory. They are copied after the builtin static files, 122 | # so a file named "default.css" will overwrite the builtin "default.css". 123 | html_static_path = ['_static'] 124 | 125 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 126 | # using the given strftime format. 127 | #html_last_updated_fmt = '%b %d, %Y' 128 | 129 | # If true, SmartyPants will be used to convert quotes and dashes to 130 | # typographically correct entities. 131 | html_use_smartypants = True 132 | 133 | # Custom sidebar templates, maps document names to template names. 134 | html_sidebars = { 135 | '**' : [ 136 | 'localtoc.html', 137 | 'globaltoc.html', 138 | 'relations.html', 139 | 'searchbox.html' 140 | ], 141 | } 142 | 143 | # Additional templates that should be rendered to pages, maps page names to 144 | # template names. 145 | #html_additional_pages = {} 146 | 147 | # If false, no module index is generated. 148 | #html_domain_indices = True 149 | 150 | # If false, no index is generated. 151 | #html_use_index = True 152 | 153 | # If true, the index is split into individual pages for each letter. 154 | #html_split_index = False 155 | 156 | # If true, links to the reST sources are added to the pages. 157 | html_show_sourcelink = False 158 | 159 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 160 | #html_show_sphinx = True 161 | 162 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 163 | #html_show_copyright = True 164 | 165 | # If true, an OpenSearch description file will be output, and all pages will 166 | # contain a tag referring to it. The value of this option must be the 167 | # base URL from which the finished HTML is served. 168 | #html_use_opensearch = '' 169 | 170 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 171 | #html_file_suffix = None 172 | 173 | # Output file base name for HTML help builder. 174 | htmlhelp_basename = 'gammatonedoc' 175 | 176 | 177 | # -- Options for LaTeX output -------------------------------------------------- 178 | 179 | latex_elements = { 180 | # The paper size ('letterpaper' or 'a4paper'). 181 | #'papersize': 'letterpaper', 182 | 183 | # The font size ('10pt', '11pt' or '12pt'). 184 | #'pointsize': '10pt', 185 | 186 | # Additional stuff for the LaTeX preamble. 187 | #'preamble': '', 188 | } 189 | 190 | # Grouping the document tree into LaTeX files. List of tuples 191 | # (source start file, target name, title, author, documentclass [howto/manual]). 192 | latex_documents = [ 193 | ('index', 'gammatone.tex', u'Gammatone Documentation', 194 | u'Jason Heeris', 'manual'), 195 | ] 196 | 197 | # The name of an image file (relative to this directory) to place at the top of 198 | # the title page. 199 | #latex_logo = None 200 | 201 | # For "manual" documents, if this is true, then toplevel headings are parts, 202 | # not chapters. 203 | #latex_use_parts = False 204 | 205 | # If true, show page references after internal links. 206 | #latex_show_pagerefs = False 207 | 208 | # If true, show URL addresses after external links. 209 | #latex_show_urls = False 210 | 211 | # Documents to append as an appendix to all manuals. 212 | #latex_appendices = [] 213 | 214 | # If false, no module index is generated. 215 | #latex_domain_indices = True 216 | 217 | 218 | # -- Options for manual page output -------------------------------------------- 219 | 220 | # One entry per manual page. List of tuples 221 | # (source start file, name, description, authors, manual section). 222 | man_pages = [ 223 | ('index', 'gammatone', u'Gammatone Documentation', 224 | [u'Jason Heeris'], 1) 225 | ] 226 | 227 | # If true, show URL addresses after external links. 228 | #man_show_urls = False 229 | 230 | 231 | # -- Options for Texinfo output ------------------------------------------------ 232 | 233 | # Grouping the document tree into Texinfo files. List of tuples 234 | # (source start file, target name, title, author, 235 | # dir menu entry, description, category) 236 | texinfo_documents = [ 237 | ('index', 'gammatone', u'Gammatone Documentation', 238 | u'Jason Heeris', 'gammatone', 'Gammatone filterbank construction tools.', 239 | 'Miscellaneous'), 240 | ] 241 | 242 | # Documents to append as an appendix to all manuals. 243 | #texinfo_appendices = [] 244 | 245 | # If false, no module index is generated. 246 | #texinfo_domain_indices = True 247 | 248 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 249 | #texinfo_show_urls = 'footnote' 250 | 251 | # -- Autodoc configuration ----------------------------------------------------- 252 | 253 | # autodoc_default_flags = ['members'] 254 | -------------------------------------------------------------------------------- /gammatone/doc/details.rst: -------------------------------------------------------------------------------- 1 | About the Gammatone Filterbank Toolkit 2 | -------------------------------------- 3 | 4 | Summary 5 | ~~~~~~~ 6 | 7 | This is a port of Malcolm Slaney's and Dan Ellis' gammatone filterbank 8 | MATLAB code, detailed below, to Python 2 and 3 using Numpy and Scipy. It 9 | analyses signals by running them through banks of gammatone filters, 10 | similar to Fourier-based spectrogram analysis. 11 | 12 | .. figure:: FurElise.png 13 | :align: center 14 | :alt: Gammatone-based spectrogram of Für Elise 15 | 16 | Gammatone-based spectrogram of Für Elise 17 | 18 | Dependencies 19 | ~~~~~~~~~~~~ 20 | 21 | - numpy 22 | - scipy 23 | - nose 24 | - mock 25 | - matplotlib 26 | 27 | Using the Code 28 | ~~~~~~~~~~~~~~ 29 | 30 | For a demonstration, find a `.wav` file (for example, 31 | `Für Elise `_) and run:: 32 | 33 | python -m gammatone FurElise.wav -d 10 34 | 35 | ...to see a gammatone-gram of the first ten seconds of Beethoven's "Für 36 | Elise." If you've installed via 37 | ``pip`` or ``setup.py install``, you should also be able to just run:: 38 | 39 | gammatone FurElise.wav -d 10 40 | 41 | Basis 42 | ~~~~~ 43 | 44 | This project is based on research into how humans perceive audio, 45 | originally published by Malcolm Slaney: 46 | 47 | `Malcolm Slaney (1998) "Auditory Toolbox Version 2", Technical Report 48 | #1998-010, Interval Research Corporation, 49 | 1998. `_ 50 | 51 | Slaney's report describes a way of modelling how the human ear 52 | perceives, emphasises and separates different frequencies of sound. A 53 | series of gammatone filters are constructed whose width increases with 54 | increasing centre frequency, and this bank of filters is applied to a 55 | time-domain signal. The result of this is a spectrum that should 56 | represent the human experience of sound better than, say, a 57 | Fourier-domain spectrum would. 58 | 59 | A gammatone filter has an impulse response that is a sine wave 60 | multiplied by a gamma distribution function. It is a common approach to 61 | modelling the auditory system. 62 | 63 | The gammatone filterbank approach can be considered analogous (but not 64 | equivalent) to a discrete Fourier transform where the frequency axis is 65 | logarithmic. For example, a series of notes spaced an octave apart would 66 | appear to be roughly linearly spaced; or a sound that was distributed 67 | across the same linear frequency range would appear to have more spread 68 | at lower frequencies. 69 | 70 | The real goal of this toolkit is to allow easy computation of the 71 | gammatone equivalent of a spectrogram — a time-varying spectrum of 72 | energy over audible frequencies based on a gammatone filterbank. 73 | 74 | Slaney demonstrated his research with an initial implementation in 75 | MATLAB. This implementation was later extended by Dan Ellis, who found a 76 | way to approximate a "gammatone-gram" by using the fast Fourier 77 | transform. Ellis' code calculates a matrix of weights that can be 78 | applied to the output of a FFT so that a Fourier-based spectrogram can 79 | easily be transformed into such an approximation. 80 | 81 | Ellis' code and documentation is here: `Gammatone-like 82 | spectrograms `_ 83 | 84 | Interest 85 | ~~~~~~~~ 86 | 87 | I became interested in this because of my background in science 88 | communication and my general interest in the teaching of signal 89 | processing. I find that the spectrogram approach to visualising signals 90 | is adequate for illustrating abstract systems or the mathematical 91 | properties of transforms, but bears little correspondence to a person's 92 | own experience of sound. If someone wants to see what their favourite 93 | piece of music "looks like," a normal Fourier transform based 94 | spectrogram is actually quite a poor way to visualise it. Features of 95 | the audio seem to be oddly spaced or unnaturally emphasised or 96 | de-emphasised depending on where they are in the frequency domain. 97 | 98 | The gammatone filterbank approach seems to be closer to what someone 99 | might intuitively expect a visualisation of sound to look like, and can 100 | help develop an intuition about alternative representations of signals. 101 | 102 | Verifying the port 103 | ~~~~~~~~~~~~~~~~~~ 104 | 105 | Since this is a port of existing MATLAB code, I've written tests to 106 | verify the Python implementation against the original code. These tests 107 | aren't unit tests, but they do generally test single functions. Running 108 | the tests has the same workflow: 109 | 110 | 1. Run the scripts in the ``test_generation`` directory. This will 111 | create a ``.mat`` file containing test data in ``tests/data``. 112 | 113 | 2. Run ``nosetest3`` in the top level directory. This will find and run 114 | all the tests in the ``tests`` directory. 115 | 116 | Although I'm usually loathe to check in generated files to version 117 | control, I'm willing to make an exception for the ``.mat`` files 118 | containing the test data. My reasoning is that they represent the 119 | decoupling of my code from the MATLAB code, and if the two projects were 120 | separated, they would be considered a part of the Python code, not the 121 | original MATLAB code. 122 | -------------------------------------------------------------------------------- /gammatone/doc/fftweight.rst: -------------------------------------------------------------------------------- 1 | :mod:`gammatone.fftweight` -- FFT weightings for spectrogram-like gammatone analysis 2 | ==================================================================================== 3 | 4 | .. automodule:: gammatone.fftweight 5 | :members: 6 | -------------------------------------------------------------------------------- /gammatone/doc/filters.rst: -------------------------------------------------------------------------------- 1 | :mod:`gammatone.filters` -- gammatone filterbank construction 2 | ============================================================= 3 | 4 | .. automodule:: gammatone.filters 5 | :members: 6 | -------------------------------------------------------------------------------- /gammatone/doc/gtgram.rst: -------------------------------------------------------------------------------- 1 | :mod:`gammatone.gtgram` -- spectrogram-like gammatone analysis 2 | ============================================================== 3 | 4 | .. automodule:: gammatone.gtgram 5 | :members: 6 | -------------------------------------------------------------------------------- /gammatone/doc/index.rst: -------------------------------------------------------------------------------- 1 | .. gammatone documentation master file, created by 2 | sphinx-quickstart on Sat Dec 8 23:21:49 2012. 3 | 4 | Index 5 | ===== 6 | 7 | Modules 8 | ------- 9 | 10 | .. toctree:: 11 | :maxdepth: 2 12 | 13 | filters 14 | gtgram 15 | fftweight 16 | plot 17 | 18 | .. include:: details.rst 19 | 20 | Indices and tables 21 | ------------------ 22 | 23 | * :ref:`genindex` 24 | * :ref:`modindex` 25 | * :ref:`search` 26 | 27 | -------------------------------------------------------------------------------- /gammatone/doc/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | REM Command file for Sphinx documentation 4 | 5 | if "%SPHINXBUILD%" == "" ( 6 | set SPHINXBUILD=sphinx-build 7 | ) 8 | set BUILDDIR=_build 9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . 10 | set I18NSPHINXOPTS=%SPHINXOPTS% . 11 | if NOT "%PAPER%" == "" ( 12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% 13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% 14 | ) 15 | 16 | if "%1" == "" goto help 17 | 18 | if "%1" == "help" ( 19 | :help 20 | echo.Please use `make ^` where ^ is one of 21 | echo. html to make standalone HTML files 22 | echo. dirhtml to make HTML files named index.html in directories 23 | echo. singlehtml to make a single large HTML file 24 | echo. pickle to make pickle files 25 | echo. json to make JSON files 26 | echo. htmlhelp to make HTML files and a HTML help project 27 | echo. qthelp to make HTML files and a qthelp project 28 | echo. devhelp to make HTML files and a Devhelp project 29 | echo. epub to make an epub 30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter 31 | echo. text to make text files 32 | echo. man to make manual pages 33 | echo. texinfo to make Texinfo files 34 | echo. gettext to make PO message catalogs 35 | echo. changes to make an overview over all changed/added/deprecated items 36 | echo. linkcheck to check all external links for integrity 37 | echo. doctest to run all doctests embedded in the documentation if enabled 38 | goto end 39 | ) 40 | 41 | if "%1" == "clean" ( 42 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i 43 | del /q /s %BUILDDIR%\* 44 | goto end 45 | ) 46 | 47 | if "%1" == "html" ( 48 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html 49 | if errorlevel 1 exit /b 1 50 | echo. 51 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. 52 | goto end 53 | ) 54 | 55 | if "%1" == "dirhtml" ( 56 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml 57 | if errorlevel 1 exit /b 1 58 | echo. 59 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. 60 | goto end 61 | ) 62 | 63 | if "%1" == "singlehtml" ( 64 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml 65 | if errorlevel 1 exit /b 1 66 | echo. 67 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. 68 | goto end 69 | ) 70 | 71 | if "%1" == "pickle" ( 72 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle 73 | if errorlevel 1 exit /b 1 74 | echo. 75 | echo.Build finished; now you can process the pickle files. 76 | goto end 77 | ) 78 | 79 | if "%1" == "json" ( 80 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json 81 | if errorlevel 1 exit /b 1 82 | echo. 83 | echo.Build finished; now you can process the JSON files. 84 | goto end 85 | ) 86 | 87 | if "%1" == "htmlhelp" ( 88 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp 89 | if errorlevel 1 exit /b 1 90 | echo. 91 | echo.Build finished; now you can run HTML Help Workshop with the ^ 92 | .hhp project file in %BUILDDIR%/htmlhelp. 93 | goto end 94 | ) 95 | 96 | if "%1" == "qthelp" ( 97 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp 98 | if errorlevel 1 exit /b 1 99 | echo. 100 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ 101 | .qhcp project file in %BUILDDIR%/qthelp, like this: 102 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\gammatone.qhcp 103 | echo.To view the help file: 104 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\gammatone.ghc 105 | goto end 106 | ) 107 | 108 | if "%1" == "devhelp" ( 109 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp 110 | if errorlevel 1 exit /b 1 111 | echo. 112 | echo.Build finished. 113 | goto end 114 | ) 115 | 116 | if "%1" == "epub" ( 117 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub 118 | if errorlevel 1 exit /b 1 119 | echo. 120 | echo.Build finished. The epub file is in %BUILDDIR%/epub. 121 | goto end 122 | ) 123 | 124 | if "%1" == "latex" ( 125 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 126 | if errorlevel 1 exit /b 1 127 | echo. 128 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. 129 | goto end 130 | ) 131 | 132 | if "%1" == "text" ( 133 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text 134 | if errorlevel 1 exit /b 1 135 | echo. 136 | echo.Build finished. The text files are in %BUILDDIR%/text. 137 | goto end 138 | ) 139 | 140 | if "%1" == "man" ( 141 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man 142 | if errorlevel 1 exit /b 1 143 | echo. 144 | echo.Build finished. The manual pages are in %BUILDDIR%/man. 145 | goto end 146 | ) 147 | 148 | if "%1" == "texinfo" ( 149 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo 150 | if errorlevel 1 exit /b 1 151 | echo. 152 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. 153 | goto end 154 | ) 155 | 156 | if "%1" == "gettext" ( 157 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale 158 | if errorlevel 1 exit /b 1 159 | echo. 160 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. 161 | goto end 162 | ) 163 | 164 | if "%1" == "changes" ( 165 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes 166 | if errorlevel 1 exit /b 1 167 | echo. 168 | echo.The overview file is in %BUILDDIR%/changes. 169 | goto end 170 | ) 171 | 172 | if "%1" == "linkcheck" ( 173 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck 174 | if errorlevel 1 exit /b 1 175 | echo. 176 | echo.Link check complete; look for any errors in the above output ^ 177 | or in %BUILDDIR%/linkcheck/output.txt. 178 | goto end 179 | ) 180 | 181 | if "%1" == "doctest" ( 182 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest 183 | if errorlevel 1 exit /b 1 184 | echo. 185 | echo.Testing of doctests in the sources finished, look at the ^ 186 | results in %BUILDDIR%/doctest/output.txt. 187 | goto end 188 | ) 189 | 190 | :end 191 | -------------------------------------------------------------------------------- /gammatone/doc/plot.rst: -------------------------------------------------------------------------------- 1 | :mod:`gammatone.plot` -- Plotting utilities for gammatone analysis 2 | ================================================================== 3 | 4 | .. automodule:: gammatone.plot 5 | :members: 6 | -------------------------------------------------------------------------------- /gammatone/gammatone/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | 6 | # Designate gammatone module 7 | """ 8 | Gammatone filterbank toolkit 9 | """ 10 | -------------------------------------------------------------------------------- /gammatone/gammatone/__main__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | from gammatone.plot import main 6 | main() 7 | -------------------------------------------------------------------------------- /gammatone/gammatone/fftweight.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | """ 6 | This module contains functions for calculating weights to approximate a 7 | gammatone filterbank-like "spectrogram" from a Fourier transform. 8 | """ 9 | from __future__ import division 10 | import numpy as np 11 | 12 | import gammatone.filters as filters 13 | import gammatone.gtgram as gtgram 14 | 15 | def specgram_window( 16 | nfft, 17 | nwin, 18 | ): 19 | """ 20 | Window calculation used in specgram replacement function. Hann window of 21 | width `nwin` centred in an array of width `nfft`. 22 | """ 23 | halflen = nwin // 2 24 | halff = nfft // 2 # midpoint of win 25 | acthalflen = int(np.floor(min(halff, halflen))) 26 | halfwin = 0.5 * ( 1 + np.cos(np.pi * np.arange(0, halflen+1)/halflen)) 27 | win = np.zeros((nfft,)) 28 | win[halff:halff+acthalflen] = halfwin[0:acthalflen]; 29 | win[halff:halff-acthalflen:-1] = halfwin[0:acthalflen]; 30 | return win 31 | 32 | 33 | def specgram(x, n, sr, w, h): 34 | """ Substitute for Matlab's specgram, calculates a simple spectrogram. 35 | 36 | :param x: The signal to analyse 37 | :param n: The FFT length 38 | :param sr: The sampling rate 39 | :param w: The window length (see :func:`specgram_window`) 40 | :param h: The hop size (must be greater than zero) 41 | """ 42 | # Based on Dan Ellis' myspecgram.m,v 1.1 2002/08/04 43 | assert h > 0, "Must have a hop size greater than 0" 44 | 45 | s = x.shape[0] 46 | win = specgram_window(n, w) 47 | 48 | c = 0 49 | 50 | # pre-allocate output array 51 | ncols = 1 + int(np.floor((s - n)/h)) 52 | d = np.zeros(((1 + n // 2), ncols), np.dtype(complex)) 53 | 54 | for b in range(0, s - n, h): 55 | u = win * x[b : b + n] 56 | t = np.fft.fft(u) 57 | d[:, c] = t[0 : (1 + n // 2)].T 58 | c = c + 1 59 | 60 | return d 61 | 62 | 63 | def fft_weights( 64 | nfft, 65 | fs, 66 | nfilts, 67 | width, 68 | fmin, 69 | fmax, 70 | maxlen): 71 | """ 72 | :param nfft: the source FFT size 73 | :param sr: sampling rate (Hz) 74 | :param nfilts: the number of output bands required (default 64) 75 | :param width: the constant width of each band in Bark (default 1) 76 | :param fmin: lower limit of frequencies (Hz) 77 | :param fmax: upper limit of frequencies (Hz) 78 | :param maxlen: number of bins to truncate the rows to 79 | 80 | :return: a tuple `weights`, `gain` with the calculated weight matrices and 81 | gain vectors 82 | 83 | Generate a matrix of weights to combine FFT bins into Gammatone bins. 84 | 85 | Note about `maxlen` parameter: While wts has nfft columns, the second half 86 | are all zero. Hence, aud spectrum is:: 87 | 88 | fft2gammatonemx(nfft,sr)*abs(fft(xincols,nfft)) 89 | 90 | `maxlen` truncates the rows to this many bins. 91 | 92 | | (c) 2004-2009 Dan Ellis dpwe@ee.columbia.edu based on rastamat/audspec.m 93 | | (c) 2012 Jason Heeris (Python implementation) 94 | """ 95 | ucirc = np.exp(1j * 2 * np.pi * np.arange(0, nfft / 2 + 1) / nfft)[None, ...] 96 | 97 | # Common ERB filter code factored out 98 | cf_array = filters.erb_space(fmin, fmax, nfilts)[::-1] 99 | 100 | _, A11, A12, A13, A14, _, _, _, B2, gain = ( 101 | filters.make_erb_filters(fs, cf_array, width).T 102 | ) 103 | 104 | A11, A12, A13, A14 = A11[..., None], A12[..., None], A13[..., None], A14[..., None] 105 | 106 | r = np.sqrt(B2) 107 | theta = 2 * np.pi * cf_array / fs 108 | pole = (r * np.exp(1j * theta))[..., None] 109 | 110 | GTord = 4 111 | 112 | weights = np.zeros((nfilts, nfft)) 113 | 114 | weights[:, 0:ucirc.shape[1]] = ( 115 | np.abs(ucirc + A11 * fs) * np.abs(ucirc + A12 * fs) 116 | * np.abs(ucirc + A13 * fs) * np.abs(ucirc + A14 * fs) 117 | * np.abs(fs * (pole - ucirc) * (pole.conj() - ucirc)) ** (-GTord) 118 | / gain[..., None] 119 | ) 120 | 121 | weights = weights[:, 0:int(maxlen)] 122 | 123 | return weights, gain 124 | 125 | 126 | def fft_gtgram( 127 | wave, 128 | fs, 129 | window_time, hop_time, 130 | channels, 131 | f_min): 132 | """ 133 | Calculate a spectrogram-like time frequency magnitude array based on 134 | an FFT-based approximation to gammatone subband filters. 135 | 136 | A matrix of weightings is calculated (using :func:`gtgram.fft_weights`), and 137 | applied to the FFT of the input signal (``wave``, using sample rate ``fs``). 138 | The result is an approximation of full filtering using an ERB gammatone 139 | filterbank (as per :func:`gtgram.gtgram`). 140 | 141 | ``f_min`` determines the frequency cutoff for the corresponding gammatone 142 | filterbank. ``window_time`` and ``hop_time`` (both in seconds) are the size 143 | and overlap of the spectrogram columns. 144 | 145 | | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu 146 | | 147 | | (c) 2013 Jason Heeris (Python implementation) 148 | """ 149 | width = 1 # Was a parameter in the MATLAB code 150 | 151 | nfft = int(2 ** (np.ceil(np.log2(2 * window_time * fs)))) 152 | nwin, nhop, _ = gtgram.gtgram_strides(fs, window_time, hop_time, 0); 153 | 154 | gt_weights, _ = fft_weights( 155 | nfft, 156 | fs, 157 | channels, 158 | width, 159 | f_min, 160 | fs / 2, 161 | nfft / 2 + 1 162 | ) 163 | 164 | sgram = specgram(wave, nfft, fs, nwin, nhop) 165 | 166 | result = gt_weights.dot(np.abs(sgram)) / nfft 167 | 168 | return result 169 | -------------------------------------------------------------------------------- /gammatone/gammatone/filters.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | """ 6 | This module contains functions for constructing sets of equivalent rectangular 7 | bandwidth gammatone filters. 8 | """ 9 | from __future__ import division 10 | from collections import namedtuple 11 | 12 | import numpy as np 13 | import scipy as sp 14 | from scipy import signal as sgn 15 | 16 | DEFAULT_FILTER_NUM = 100 17 | DEFAULT_LOW_FREQ = 100 18 | DEFAULT_HIGH_FREQ = 44100 / 4 19 | 20 | 21 | def erb_point(low_freq, high_freq, fraction): 22 | """ 23 | Calculates a single point on an ERB scale between ``low_freq`` and 24 | ``high_freq``, determined by ``fraction``. When ``fraction`` is ``1``, 25 | ``low_freq`` will be returned. When ``fraction`` is ``0``, ``high_freq`` 26 | will be returned. 27 | 28 | ``fraction`` can actually be outside the range ``[0, 1]``, which in general 29 | isn't very meaningful, but might be useful when ``fraction`` is rounded a 30 | little above or below ``[0, 1]`` (eg. for plot axis labels). 31 | """ 32 | # Change the following three parameters if you wish to use a different ERB 33 | # scale. Must change in MakeERBCoeffs too. 34 | # TODO: Factor these parameters out 35 | ear_q = 9.26449 # Glasberg and Moore Parameters 36 | min_bw = 24.7 37 | order = 1 38 | 39 | # All of the following expressions are derived in Apple TR #35, "An 40 | # Efficient Implementation of the Patterson-Holdsworth Cochlear Filter 41 | # Bank." See pages 33-34. 42 | erb_point = ( 43 | -ear_q * min_bw 44 | + np.exp( 45 | fraction * ( 46 | -np.log(high_freq + ear_q * min_bw) 47 | + np.log(low_freq + ear_q * min_bw) 48 | ) 49 | ) * 50 | (high_freq + ear_q * min_bw) 51 | ) 52 | 53 | return erb_point 54 | 55 | 56 | def erb_space( 57 | low_freq=DEFAULT_LOW_FREQ, 58 | high_freq=DEFAULT_HIGH_FREQ, 59 | num=DEFAULT_FILTER_NUM): 60 | """ 61 | This function computes an array of ``num`` frequencies uniformly spaced 62 | between ``high_freq`` and ``low_freq`` on an ERB scale. 63 | 64 | For a definition of ERB, see Moore, B. C. J., and Glasberg, B. R. (1983). 65 | "Suggested formulae for calculating auditory-filter bandwidths and 66 | excitation patterns," J. Acoust. Soc. Am. 74, 750-753. 67 | """ 68 | return erb_point( 69 | low_freq, 70 | high_freq, 71 | np.arange(1, num + 1) / num 72 | ) 73 | 74 | 75 | def centre_freqs(fs, num_freqs, cutoff): 76 | """ 77 | Calculates an array of centre frequencies (for :func:`make_erb_filters`) 78 | from a sampling frequency, lower cutoff frequency and the desired number of 79 | filters. 80 | 81 | :param fs: sampling rate 82 | :param num_freqs: number of centre frequencies to calculate 83 | :type num_freqs: int 84 | :param cutoff: lower cutoff frequency 85 | :return: same as :func:`erb_space` 86 | """ 87 | return erb_space(cutoff, fs / 2, num_freqs) 88 | 89 | 90 | def make_erb_filters(fs, centre_freqs, width=1.0): 91 | """ 92 | This function computes the filter coefficients for a bank of 93 | Gammatone filters. These filters were defined by Patterson and Holdworth for 94 | simulating the cochlea. 95 | 96 | The result is returned as a :class:`ERBCoeffArray`. Each row of the 97 | filter arrays contains the coefficients for four second order filters. The 98 | transfer function for these four filters share the same denominator (poles) 99 | but have different numerators (zeros). All of these coefficients are 100 | assembled into one vector that the ERBFilterBank can take apart to implement 101 | the filter. 102 | 103 | The filter bank contains "numChannels" channels that extend from 104 | half the sampling rate (fs) to "lowFreq". Alternatively, if the numChannels 105 | input argument is a vector, then the values of this vector are taken to be 106 | the center frequency of each desired filter. (The lowFreq argument is 107 | ignored in this case.) 108 | 109 | Note this implementation fixes a problem in the original code by 110 | computing four separate second order filters. This avoids a big problem with 111 | round off errors in cases of very small cfs (100Hz) and large sample rates 112 | (44kHz). The problem is caused by roundoff error when a number of poles are 113 | combined, all very close to the unit circle. Small errors in the eigth order 114 | coefficient, are multiplied when the eigth root is taken to give the pole 115 | location. These small errors lead to poles outside the unit circle and 116 | instability. Thanks to Julius Smith for leading me to the proper 117 | explanation. 118 | 119 | Execute the following code to evaluate the frequency response of a 10 120 | channel filterbank:: 121 | 122 | fcoefs = MakeERBFilters(16000,10,100); 123 | y = ERBFilterBank([1 zeros(1,511)], fcoefs); 124 | resp = 20*log10(abs(fft(y'))); 125 | freqScale = (0:511)/512*16000; 126 | semilogx(freqScale(1:255),resp(1:255,:)); 127 | axis([100 16000 -60 0]) 128 | xlabel('Frequency (Hz)'); ylabel('Filter Response (dB)'); 129 | 130 | | Rewritten by Malcolm Slaney@Interval. June 11, 1998. 131 | | (c) 1998 Interval Research Corporation 132 | | 133 | | (c) 2012 Jason Heeris (Python implementation) 134 | """ 135 | T = 1 / fs 136 | # Change the followFreqing three parameters if you wish to use a different 137 | # ERB scale. Must change in ERBSpace too. 138 | # TODO: factor these out 139 | ear_q = 9.26449 # Glasberg and Moore Parameters 140 | min_bw = 24.7 141 | order = 1 142 | 143 | erb = width*((centre_freqs / ear_q) ** order + min_bw ** order) ** ( 1 /order) 144 | B = 1.019 * 2 * np.pi * erb 145 | 146 | arg = 2 * centre_freqs * np.pi * T 147 | vec = np.exp(2j * arg) 148 | 149 | A0 = T 150 | A2 = 0 151 | B0 = 1 152 | B1 = -2 * np.cos(arg) / np.exp(B * T) 153 | B2 = np.exp(-2 * B * T) 154 | 155 | rt_pos = np.sqrt(3 + 2 ** 1.5) 156 | rt_neg = np.sqrt(3 - 2 ** 1.5) 157 | 158 | common = -T * np.exp(-(B * T)) 159 | 160 | # TODO: This could be simplified to a matrix calculation involving the 161 | # constant first term and the alternating rt_pos/rt_neg and +/-1 second 162 | # terms 163 | k11 = np.cos(arg) + rt_pos * np.sin(arg) 164 | k12 = np.cos(arg) - rt_pos * np.sin(arg) 165 | k13 = np.cos(arg) + rt_neg * np.sin(arg) 166 | k14 = np.cos(arg) - rt_neg * np.sin(arg) 167 | 168 | A11 = common * k11 169 | A12 = common * k12 170 | A13 = common * k13 171 | A14 = common * k14 172 | 173 | gain_arg = np.exp(1j * arg - B * T) 174 | 175 | gain = np.abs( 176 | (vec - gain_arg * k11) 177 | * (vec - gain_arg * k12) 178 | * (vec - gain_arg * k13) 179 | * (vec - gain_arg * k14) 180 | * ( T * np.exp(B * T) 181 | / (-1 / np.exp(B * T) + 1 + vec * (1 - np.exp(B * T))) 182 | )**4 183 | ) 184 | 185 | allfilts = np.ones_like(centre_freqs) 186 | 187 | fcoefs = np.column_stack([ 188 | A0 * allfilts, A11, A12, A13, A14, A2*allfilts, 189 | B0 * allfilts, B1, B2, 190 | gain 191 | ]) 192 | 193 | return fcoefs 194 | 195 | 196 | def erb_filterbank(wave, coefs): 197 | """ 198 | :param wave: input data (one dimensional sequence) 199 | :param coefs: gammatone filter coefficients 200 | 201 | Process an input waveform with a gammatone filter bank. This function takes 202 | a single sound vector, and returns an array of filter outputs, one channel 203 | per row. 204 | 205 | The fcoefs parameter, which completely specifies the Gammatone filterbank, 206 | should be designed with the :func:`make_erb_filters` function. 207 | 208 | | Malcolm Slaney @ Interval, June 11, 1998. 209 | | (c) 1998 Interval Research Corporation 210 | | Thanks to Alain de Cheveigne' for his suggestions and improvements. 211 | | 212 | | (c) 2013 Jason Heeris (Python implementation) 213 | """ 214 | output = np.zeros((coefs[:,9].shape[0], wave.shape[0])) 215 | 216 | gain = coefs[:, 9] 217 | # A0, A11, A2 218 | As1 = coefs[:, (0, 1, 5)] 219 | # A0, A12, A2 220 | As2 = coefs[:, (0, 2, 5)] 221 | # A0, A13, A2 222 | As3 = coefs[:, (0, 3, 5)] 223 | # A0, A14, A2 224 | As4 = coefs[:, (0, 4, 5)] 225 | # B0, B1, B2 226 | Bs = coefs[:, 6:9] 227 | 228 | # Loop over channels 229 | for idx in range(0, coefs.shape[0]): 230 | # These seem to be reversed (in the sense of A/B order), but that's what 231 | # the original code did... 232 | # Replacing these with polynomial multiplications reduces both accuracy 233 | # and speed. 234 | y1 = sgn.lfilter(As1[idx], Bs[idx], wave) 235 | y2 = sgn.lfilter(As2[idx], Bs[idx], y1) 236 | y3 = sgn.lfilter(As3[idx], Bs[idx], y2) 237 | y4 = sgn.lfilter(As4[idx], Bs[idx], y3) 238 | output[idx, :] = y4 / gain[idx] 239 | 240 | return output 241 | -------------------------------------------------------------------------------- /gammatone/gammatone/gtgram.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | from __future__ import division 6 | import numpy as np 7 | 8 | from .filters import make_erb_filters, centre_freqs, erb_filterbank 9 | 10 | """ 11 | This module contains functions for rendering "spectrograms" which use gammatone 12 | filterbanks instead of Fourier transforms. 13 | """ 14 | 15 | def round_half_away_from_zero(num): 16 | """ Implement the round-half-away-from-zero rule, where fractional parts of 17 | 0.5 result in rounding up to the nearest positive integer for positive 18 | numbers, and down to the nearest negative number for negative integers. 19 | """ 20 | return np.sign(num) * np.floor(np.abs(num) + 0.5) 21 | 22 | 23 | def gtgram_strides(fs, window_time, hop_time, filterbank_cols): 24 | """ 25 | Calculates the window size for a gammatonegram. 26 | 27 | @return a tuple of (window_size, hop_samples, output_columns) 28 | """ 29 | nwin = int(round_half_away_from_zero(window_time * fs)) 30 | hop_samples = int(round_half_away_from_zero(hop_time * fs)) 31 | columns = (1 32 | + int( 33 | np.floor( 34 | (filterbank_cols - nwin) 35 | / hop_samples 36 | ) 37 | ) 38 | ) 39 | 40 | return (nwin, hop_samples, columns) 41 | 42 | 43 | def gtgram_xe(wave, fs, channels, f_min): 44 | """ Calculate the intermediate ERB filterbank processed matrix """ 45 | cfs = centre_freqs(fs, channels, f_min) 46 | fcoefs = np.flipud(make_erb_filters(fs, cfs)) 47 | xf = erb_filterbank(wave, fcoefs) 48 | xe = np.power(xf, 2) 49 | return xe 50 | 51 | 52 | def gtgram( 53 | wave, 54 | fs, 55 | window_time, hop_time, 56 | channels, 57 | f_min): 58 | """ 59 | Calculate a spectrogram-like time frequency magnitude array based on 60 | gammatone subband filters. The waveform ``wave`` (at sample rate ``fs``) is 61 | passed through an multi-channel gammatone auditory model filterbank, with 62 | lowest frequency ``f_min`` and highest frequency ``f_max``. The outputs of 63 | each band then have their energy integrated over windows of ``window_time`` 64 | seconds, advancing by ``hop_time`` secs for successive columns. These 65 | magnitudes are returned as a nonnegative real matrix with ``channels`` rows. 66 | 67 | | 2009-02-23 Dan Ellis dpwe@ee.columbia.edu 68 | | 69 | | (c) 2013 Jason Heeris (Python implementation) 70 | """ 71 | xe = gtgram_xe(wave, fs, channels, f_min) 72 | 73 | nwin, hop_samples, ncols = gtgram_strides( 74 | fs, 75 | window_time, 76 | hop_time, 77 | xe.shape[1] 78 | ) 79 | 80 | y = np.zeros((channels, ncols)) 81 | 82 | for cnum in range(ncols): 83 | segment = xe[:, cnum * hop_samples + np.arange(nwin)] 84 | y[:, cnum] = np.sqrt(segment.mean(1)) 85 | 86 | return y 87 | -------------------------------------------------------------------------------- /gammatone/gammatone/plot.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | """ 6 | Plotting utilities related to gammatone analysis, primarily for use with 7 | ``matplotlib``. 8 | """ 9 | from __future__ import division 10 | import argparse 11 | import os.path 12 | 13 | import matplotlib.pyplot 14 | import matplotlib.ticker 15 | import numpy as np 16 | import scipy.constants 17 | import scipy.io.wavfile 18 | 19 | from .filters import erb_point 20 | import gammatone.gtgram 21 | import gammatone.fftweight 22 | 23 | 24 | class ERBFormatter(matplotlib.ticker.EngFormatter): 25 | """ 26 | Axis formatter for gammatone filterbank analysis. This formatter calculates 27 | the ERB spaced frequencies used for analysis, and renders them similarly to 28 | the engineering axis formatter. 29 | 30 | The scale is changed so that `[0, 1]` corresponds to ERB spaced frequencies 31 | from ``high_freq`` to ``low_freq`` (note the reversal). It should be used 32 | with ``imshow`` where the ``extent`` argument is ``[a, b, 1, 0]`` (again, 33 | note the inversion). 34 | """ 35 | 36 | def __init__(self, low_freq, high_freq, *args, **kwargs): 37 | """ 38 | Creates a new :class ERBFormatter: for use with ``matplotlib`` plots. 39 | Note that this class does not supply the ``units`` or ``places`` 40 | arguments; typically these would be ``'Hz'`` and ``0``. 41 | 42 | :param low_freq: the low end of the gammatone filterbank frequency range 43 | :param high_freq: the high end of the gammatone filterbank frequency 44 | range 45 | """ 46 | self.low_freq = low_freq 47 | self.high_freq = high_freq 48 | super().__init__(*args, **kwargs) 49 | 50 | def _erb_axis_scale(self, fraction): 51 | return erb_point(self.low_freq, self.high_freq, fraction) 52 | 53 | def __call__(self, val, pos=None): 54 | newval = self._erb_axis_scale(val) 55 | return super().__call__(newval, pos) 56 | 57 | 58 | def gtgram_plot( 59 | gtgram_function, 60 | axes, x, fs, 61 | window_time, hop_time, channels, f_min, 62 | imshow_args=None 63 | ): 64 | """ 65 | Plots a spectrogram-like time frequency magnitude array based on gammatone 66 | subband filters. 67 | 68 | :param gtgram_function: A function with signature:: 69 | 70 | fft_gtgram( 71 | wave, 72 | fs, 73 | window_time, hop_time, 74 | channels, 75 | f_min) 76 | 77 | See :func:`gammatone.gtgram.gtgram` for details of the paramters. 78 | """ 79 | # Set a nice formatter for the y-axis 80 | formatter = ERBFormatter(f_min, fs/2, unit='Hz', places=0) 81 | axes.yaxis.set_major_formatter(formatter) 82 | 83 | # Figure out time axis scaling 84 | duration = len(x) / fs 85 | 86 | # Calculate 1:1 aspect ratio 87 | aspect_ratio = duration/scipy.constants.golden 88 | 89 | gtg = gtgram_function(x, fs, window_time, hop_time, channels, f_min) 90 | Z = np.flipud(20 * np.log10(gtg)) 91 | 92 | img = axes.imshow(Z, extent=[0, duration, 1, 0], aspect=aspect_ratio) 93 | 94 | 95 | # Entry point for CLI script 96 | 97 | HELP_TEXT = """\ 98 | Plots the gammatone filterbank analysis of a WAV file. 99 | 100 | If the file contains more than one channel, all channels are averaged before 101 | performing analysis. 102 | """ 103 | 104 | 105 | def render_audio_from_file(path, duration, function): 106 | """ 107 | Renders the given ``duration`` of audio from the audio file at ``path`` 108 | using the gammatone spectrogram function ``function``. 109 | """ 110 | samplerate, data = scipy.io.wavfile.read(path) 111 | 112 | # Average the stereo signal 113 | if duration: 114 | nframes = duration * samplerate 115 | data = data[0 : nframes, :] 116 | 117 | signal = data.mean(1) 118 | 119 | # Default gammatone-based spectrogram parameters 120 | twin = 0.08 121 | thop = twin / 2 122 | channels = 1024 123 | fmin = 20 124 | 125 | # Set up the plot 126 | fig = matplotlib.pyplot.figure() 127 | axes = fig.add_axes([0.1, 0.1, 0.8, 0.8]) 128 | 129 | gtgram_plot( 130 | function, 131 | axes, 132 | signal, 133 | samplerate, 134 | twin, thop, channels, fmin) 135 | 136 | axes.set_title(os.path.basename(path)) 137 | axes.set_xlabel("Time (s)") 138 | axes.set_ylabel("Frequency") 139 | 140 | matplotlib.pyplot.show() 141 | 142 | 143 | def main(): 144 | """ 145 | Entry point for CLI application to plot gammatonegrams of sound files. 146 | """ 147 | parser = argparse.ArgumentParser(description=HELP_TEXT) 148 | 149 | parser.add_argument( 150 | 'sound_file', 151 | help="The sound file to graph. See the help text for supported formats.") 152 | 153 | parser.add_argument( 154 | '-d', '--duration', type=int, 155 | help="The time in seconds from the start of the audio to use for the " 156 | "graph (default is to use the whole file)." 157 | ) 158 | 159 | parser.add_argument( 160 | '-a', '--accurate', action='store_const', dest='function', 161 | const=gammatone.gtgram.gtgram, default=gammatone.fftweight.fft_gtgram, 162 | help="Use the full filterbank approach instead of the weighted FFT " 163 | "approximation. This is much slower, and uses a lot of memory, but" 164 | " is more accurate." 165 | ) 166 | 167 | args = parser.parse_args() 168 | 169 | return render_audio_from_file(args.sound_file, args.duration, args.function) 170 | -------------------------------------------------------------------------------- /gammatone/setup.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | from setuptools import setup, find_packages 6 | 7 | setup( 8 | name = "Gammatone", 9 | version = "1.0", 10 | packages = find_packages(), 11 | 12 | install_requires = [ 13 | 'numpy', 14 | 'scipy', 15 | 'nose', 16 | 'mock', 17 | 'matplotlib', 18 | ], 19 | 20 | entry_points = { 21 | 'console_scripts': [ 22 | 'gammatone = gammatone.plot:main', 23 | ] 24 | } 25 | ) 26 | -------------------------------------------------------------------------------- /gammatone/test_generation/README: -------------------------------------------------------------------------------- 1 | These are Octave/MATLAB scripts that create test data for the Python 2 | implementation of that gammatone library. 3 | 4 | You must add both this directory and the top level 'auditory_toolkit' directory 5 | to your search path. 6 | 7 | The scripts are designed to run under MATLAB and Octave (using '--traditional'). 8 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_ERBFilterBank.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_ERBFilterBank() 6 | 7 | erb_space_inputs = { ... 8 | 100, 11025, 10, sin(2*pi*220*[0:22050/100]'/22050); ... 9 | 20, 22050, 10, square(2*pi*150*[0:44100/200]'/44100); ... 10 | 20, 44100, 40, square(2*pi*12000*[0:88200/400]'/88200); ... 11 | 100, 11025, 1000, sawtooth(2*pi*10100*[0:22050/100]'/22050, 0.5); ... 12 | 500, 80000, 200, sawtooth(2*pi*3333*[0:160000/400]'/160000, 0.5); ... 13 | }; 14 | 15 | erb_filter_inputs = { ... 16 | 44100, [22050; 2205; 220], square(2*pi*220*[0:44100/200]'/44100); ... 17 | 16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000], square(2*pi*2000*[0:16000/50]'/16000); ... 18 | 16000, [16000; 8000; 1], square(2*pi*880*[0:16000/50]'/16000); ... 19 | }; 20 | 21 | num_tests = size(erb_space_inputs)(1) ... 22 | + size(erb_filter_inputs)(1); 23 | 24 | erb_filterbank_inputs = {}; 25 | 26 | erb_filterbank_results = {}; 27 | 28 | % This will ONLY generate tests that use the centre frequency inputs 29 | 30 | % ERBSpace generated inputs 31 | for tnum=1:size(erb_space_inputs)(1) 32 | [f_low, f_high, num_f, wave] = deal(erb_space_inputs{tnum,:}); 33 | fs = f_high*2; 34 | f_arr = ERBSpace(f_low, f_high, num_f); 35 | fcoefs = MakeERBFilters(fs, f_arr, 0); 36 | erb_filterbank_inputs(tnum, :) = {fcoefs, wave}; 37 | end 38 | 39 | % MakeERBFilters generated inputs 40 | for tnum=1:size(erb_filter_inputs) 41 | [fs, f_arr, wave] = deal(erb_filter_inputs{tnum,:}); 42 | fcoefs = MakeERBFilters(fs, f_arr, 0); 43 | offset = size(erb_space_inputs)(1); 44 | erb_filterbank_inputs(offset+tnum, :) = {fcoefs, wave}; 45 | end 46 | 47 | for tnum=1:num_tests 48 | fcoefs = erb_filterbank_inputs{tnum, 1}; 49 | wave = erb_filterbank_inputs{tnum, 2}; 50 | erb_filterbank_results(tnum, :) = ERBFilterBank(wave, fcoefs); 51 | end 52 | 53 | results_file = fullfile('..', 'tests', 'data', 'test_filterbank_data.mat'); 54 | save(results_file, 'erb_filterbank_inputs', 'erb_filterbank_results'); 55 | end 56 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_ERBSpace.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_ERBSpace() 6 | 7 | % Low freq, high freq, N 8 | erbspace_inputs = { ... 9 | 100, 11025, 100; ... 10 | 100, 22050, 100; ... 11 | 20, 22050, 100; ... 12 | 20, 44100, 100; ... 13 | 100, 11025, 10; ... 14 | 100, 11025, 1000; ... 15 | 500, 80000, 200; ... 16 | }; 17 | 18 | erbspace_results = {}; 19 | 20 | num_tests = size(erbspace_inputs)(1); 21 | 22 | for tnum=1:num_tests 23 | [f_low, f_high, num_f] = deal(erbspace_inputs{tnum,:}); 24 | erbspace_results(tnum, :) = ERBSpace(f_low, f_high, num_f); 25 | end 26 | 27 | results_file = fullfile('..', 'tests', 'data', 'test_erbspace_data.mat'); 28 | save(results_file, 'erbspace_inputs', 'erbspace_results'); 29 | end 30 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_MakeERBFilters.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_MakeERBFilters() 6 | 7 | erb_space_inputs = { ... 8 | 100, 11025, 100; ... 9 | 100, 22050, 100; ... 10 | 20, 22050, 100; ... 11 | 20, 44100, 100; ... 12 | 100, 11025, 10; ... 13 | 100, 11025, 1000; ... 14 | 500, 80000, 200; ... 15 | }; 16 | 17 | extra_inputs = { ... 18 | 44100, [22050; 2205; 220]; ... 19 | 16000, [8000; 7000; 6000; 5000; 4000; 3000; 2000; 1000]; ... 20 | 16000, [16000; 8000; 1]; ... 21 | }; 22 | 23 | num_tests = size(erb_space_inputs)(1) + size(extra_inputs)(1); 24 | 25 | erb_filter_inputs = {}; 26 | 27 | erb_filter_results = {}; 28 | 29 | % This will ONLY generate tests that use the centre frequency inputs 30 | 31 | % ERBSpace generated inputs 32 | for tnum=1:size(erb_space_inputs)(1) 33 | [f_low, f_high, num_f] = deal(erb_space_inputs{tnum,:}); 34 | fs = f_high*2; 35 | cfs = ERBSpace(f_low, f_high, num_f); 36 | erb_filter_inputs(tnum, :) = {fs, cfs}; 37 | end 38 | 39 | erb_filter_inputs = cat(1, erb_filter_inputs, extra_inputs); 40 | 41 | for tnum=1:num_tests 42 | fs = erb_filter_inputs{tnum, 1}; 43 | cfs = erb_filter_inputs{tnum, 2}; 44 | fcoefs = MakeERBFilters(fs, cfs, 0); 45 | erb_filter_results(tnum, :) = fcoefs; 46 | end 47 | 48 | results_file = fullfile('..', 'tests', 'data', 'test_erb_filter_data.mat'); 49 | save(results_file, 'erb_filter_inputs', 'erb_filter_results'); 50 | end 51 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_fft2gammatonemx.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_fft2gtmx() 6 | % Arguments: 7 | % nfft, sr, nfilts, width, minfreq, maxfreq, maxlen 8 | 9 | fft2gtmx_inputs = { ... 10 | 256 , 48000, 64 , 1 , 100, 48000/2 , 256; ... 11 | % Vary the width parameter 12 | 256 , 48000, 64 , 2 , 100, 48000/2 , 256; ... 13 | 256 , 48000, 64 , 4 , 100, 48000/2 , 256; ... 14 | 256 , 48000, 64 , 0.25, 100, 48000/2 , 256; ... 15 | % Vary sampling rate 16 | 256 , 96000, 64 , 1 , 100, 96000/2 , 256; ... 17 | % Vary upper frequency 18 | 256 , 48000, 64 , 1 , 100, 48000/2 , 256; ... 19 | 256 , 48000, 64 , 1 , 100, 48000/4 , 256; ... 20 | 256 , 48000, 64 , 1 , 100, 48000/10, 256; ... 21 | % Vary maxlen 22 | 256 , 48000, 64 , 1 , 100, 48000/2 , 128; ... 23 | 256 , 48000, 64 , 1 , 100, 48000/2 , 16; ... 24 | 256 , 48000, 64 , 1 , 100, 48000/2 , 99; ... 25 | % Vary sampling rate 26 | 1024, 48000, 128, 1 , 100, 48000/2 , 512; ... 27 | 1024, 48000, 128, 1 , 100, 48000/2 , 128; ... 28 | 64 , 44100, 32 , 1 , 20 , 44100/2 , 64; ... 29 | }; 30 | 31 | fft2gtmx_results = {}; 32 | 33 | for tnum=1:size(fft2gtmx_inputs)(1) 34 | [nfft, sr, nfilts, width, minfreq, maxfreq, maxlen] = deal(fft2gtmx_inputs{tnum,:}); 35 | [wts, gain] = fft2gammatonemx(nfft, sr, nfilts, width, minfreq, maxfreq, maxlen); 36 | fft2gtmx_results(tnum, :) = {wts, gain}; 37 | end 38 | 39 | results_file = fullfile('..', 'tests', 'data', 'test_fft2gtmx_data.mat'); 40 | save(results_file, 'fft2gtmx_inputs', 'fft2gtmx_results'); 41 | end 42 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_fft_gammatonegram.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_fft_gammatonegram() 6 | % Need: 7 | % wave 8 | % fs 9 | % window_time 10 | % hop_time 11 | % channels 12 | % f_min 13 | % f_max 14 | 15 | % Need to mock out: 16 | % make_erb_filters output (elide) 17 | % centre_freqs (elide) 18 | % erb_filterbank (depends on X, SR, N, FMIN) 19 | 20 | % Ensure reproducible tests 21 | rand('state', [3 1 4 1 5 9 2 7]); 22 | 23 | fft_gammatonegram_inputs = { 24 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ... 25 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ... 26 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ... 27 | 'rand_01' , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ... 28 | 'rand_02' , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ... 29 | 'rand_03' , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ... 30 | }; 31 | 32 | % Mocked intermediate results for unit testing 33 | fft_gammatonegram_mocks = {}; 34 | 35 | % Actual results 36 | fft_gammatonegram_results = {}; 37 | 38 | for tnum=1:size(fft_gammatonegram_inputs)(1) 39 | [name, wave, fs, twin, thop, chs, fmin] = deal(fft_gammatonegram_inputs{tnum,:}); 40 | 41 | % This is for mocking the output of the equivalent Python functions 42 | nfft = 2^(ceil(log(2*twin*fs)/log(2))); 43 | nwin = round(twin * fs); 44 | nhop = round(thop * fs); 45 | 46 | % Mock out the FFT weights as well 47 | wts = fft2gammatonemx( ... 48 | nfft, ... 49 | fs, ... 50 | chs, ... 51 | 1, ... % width is always 1 in the Python implementation 52 | fmin, ... 53 | fs/2, ... 54 | nfft/2+1 ... 55 | ); 56 | 57 | % Mock out windowing function 58 | window = gtgram_window(nfft, nwin); 59 | 60 | res = gammatonegram( ... 61 | wave, ... 62 | fs, ... 63 | twin, ... 64 | thop, ... 65 | chs, ... 66 | fmin, ... 67 | fs/2, % fmax is always fs/2 in the Python version 68 | 1 % Use FFT method 69 | ); 70 | 71 | fft_gammatonegram_mocks(tnum, :) = { ... 72 | wts ... 73 | }; 74 | 75 | fft_gammatonegram_results(tnum, :) = { ... 76 | res, ... 77 | window, ... 78 | nfft, ... 79 | nwin, ... 80 | nhop ... 81 | }; 82 | 83 | end; 84 | 85 | results_file = fullfile('..', 'tests', 'data', 'test_fft_gammatonegram_data.mat'); 86 | save(results_file, 'fft_gammatonegram_inputs', 'fft_gammatonegram_mocks', 'fft_gammatonegram_results'); 87 | end; 88 | 89 | 90 | function win = gtgram_window(n, w) 91 | % Reproduction of Dan Ellis' windowing function built in to specgram.m 92 | halflen = w/2; 93 | halff = n/2; % midpoint of win 94 | acthalflen = min(halff, halflen); 95 | 96 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen)); 97 | win = zeros(1, n); 98 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen); 99 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen); 100 | end; -------------------------------------------------------------------------------- /gammatone/test_generation/test_gammatonegram.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_gammatonegram() 6 | % Need: 7 | % wave 8 | % fs 9 | % window_time 10 | % hop_time 11 | % channels 12 | % f_min 13 | % f_max 14 | 15 | % Need to mock out: 16 | % make_erb_filters output (elide) 17 | % centre_freqs (elide) 18 | % erb_filterbank (depends on X, SR, N, FMIN) 19 | 20 | % Ensure reproducible tests 21 | rand('state', [3 1 4 1 5 9 2 7]); 22 | 23 | gammatonegram_inputs = { 24 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 22050, 0.025, 0.010, 64, 50; ... 25 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.01, 0.01, 64, 50; ... 26 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 48000, 0.025, 0.01, 32, 50; ... 27 | 'rand_01' , rand([1, 4410 - 1]), 44100, 0.02, 0.015, 128, 500; ... 28 | 'rand_02' , rand([1, 9600 - 1]), 96000, 0.01, 0.005, 256, 20; ... 29 | 'rand_03' , rand([1, 4800 - 1]), 48000, 0.01, 0.010, 256, 20; ... 30 | }; 31 | 32 | % Mocked intermediate results for unit testing 33 | gammatonegram_mocks = {}; 34 | 35 | % Actual results 36 | gammatonegram_results = {}; 37 | 38 | for tnum=1:size(gammatonegram_inputs)(1) 39 | [name, wave, fs, twin, thop, chs, fmin] = deal(gammatonegram_inputs{tnum,:}); 40 | res = gammatonegram( ... 41 | wave, ... 42 | fs, ... 43 | twin, ... 44 | thop, ... 45 | chs, ... 46 | fmin, ... 47 | 0, % fmax is ignored 48 | 0 % Don't use FFT method 49 | ); 50 | 51 | % This is for mocking the output of the equivalent Python functions 52 | nwin = round(twin * fs); 53 | hopsamps = round(thop * fs); 54 | f_coefs = flipud(MakeERBFilters(fs, chs, fmin)); 55 | x_f = ERBFilterBank(wave, f_coefs); 56 | x_e = [x_f .^ 2]; 57 | x_e_cols = size(x_e, 2); 58 | ncols = 1 + floor((x_e_cols - nwin) / hopsamps); 59 | 60 | % Mock out the ERB filter functions too 61 | fcoefs = flipud(MakeERBFilters(fs, chs, fmin)); 62 | erb_fb_output = ERBFilterBank(wave, fcoefs); 63 | 64 | gammatonegram_mocks(tnum, :) = { ... 65 | erb_fb_output, ... 66 | x_e_cols ... 67 | }; 68 | 69 | gammatonegram_results(tnum, :) = { ... 70 | res, ... 71 | nwin, ... 72 | hopsamps, ... 73 | ncols ... 74 | }; 75 | 76 | end; 77 | 78 | results_file = fullfile('..', 'tests', 'data', 'test_gammatonegram_data.mat'); 79 | save(results_file, 'gammatonegram_inputs', 'gammatonegram_mocks', 'gammatonegram_results'); 80 | end; 81 | -------------------------------------------------------------------------------- /gammatone/test_generation/test_specgram.m: -------------------------------------------------------------------------------- 1 | % Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | % 3 | % This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | % BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | function test_specgram() 6 | % Need: 7 | % wave 8 | % nfft 9 | % fs 10 | % window_size 11 | % hop (technically the function takes the overlap, but only to recalculate this) 12 | 13 | % Ensure reproducible tests 14 | rand('state', [3 1 4 1 5 9 2 7]); 15 | 16 | specgram_inputs = { 17 | 'sawtooth_01', sawtooth(2*pi*10100*[0:22050 - 1]'/22050, 0.5), 2048, 22050, 551, 221; ... 18 | 'sin220_01' , sin(2*pi*220*[0:4800 - 1]'/48000), 1024, 48000, 480, 480; ... 19 | 'sin220_02' , sin(2*pi*220*[0:4800 - 1]'/48000), 4096, 48000, 1200, 480; ... 20 | 'rand_01' , rand([1, 4410 - 1]), 2048, 44100, 882, 662; ... 21 | 'rand_02' , rand([1, 9600 - 1]), 2048, 96000, 960, 480; ... 22 | 'rand_03' , rand([1, 4800 - 1]), 1024, 48000, 480, 480; ... 23 | }; 24 | 25 | % Mocked intermediate results for unit testing 26 | specgram_mocks = {}; 27 | 28 | % Actual results 29 | specgram_results = {}; 30 | 31 | for tnum=1:size(specgram_inputs)(1) 32 | [name, wave, nfft, fs, nwin, nhop] = deal(specgram_inputs{tnum,:}); 33 | 34 | % Mock out windowing function 35 | window = gtgram_window(nfft, nwin); 36 | 37 | res = specgram( ... 38 | wave, ... 39 | nfft, ... 40 | fs, ... 41 | nwin, ... 42 | nwin - nhop ... 43 | ); 44 | 45 | specgram_mocks(tnum, :) = { ... 46 | window, ... 47 | }; 48 | 49 | specgram_results(tnum, :) = { ... 50 | res, ... 51 | }; 52 | 53 | end; 54 | 55 | results_file = fullfile('..', 'tests', 'data', 'test_specgram_data.mat'); 56 | save(results_file, 'specgram_inputs', 'specgram_mocks', 'specgram_results'); 57 | end; 58 | 59 | 60 | function win = gtgram_window(n, w) 61 | % Reproduction of Dan Ellis' windowing function built in to specgram.m 62 | halflen = w/2; 63 | halff = n/2; % midpoint of win 64 | acthalflen = min(halff, halflen); 65 | 66 | halfwin = 0.5 * ( 1 + cos( pi * (0:halflen)/halflen)); 67 | win = zeros(1, n); 68 | win((halff+1):(halff+acthalflen)) = halfwin(1:acthalflen); 69 | win((halff+1):-1:(halff-acthalflen+2)) = halfwin(1:acthalflen); 70 | end; -------------------------------------------------------------------------------- /gammatone/tests/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 2 | # 3 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 4 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 5 | 6 | # Designate as module 7 | -------------------------------------------------------------------------------- /gammatone/tests/data/test_erb_filter_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erb_filter_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_erbspace_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_erbspace_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_fft2gtmx_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft2gtmx_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_fft_gammatonegram_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_fft_gammatonegram_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_filterbank_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_filterbank_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_gammatonegram_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_gammatonegram_data.mat -------------------------------------------------------------------------------- /gammatone/tests/data/test_specgram_data.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/gammatone/tests/data/test_specgram_data.mat -------------------------------------------------------------------------------- /gammatone/tests/test_cfs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | import nose 7 | from mock import patch 8 | 9 | import gammatone.filters 10 | 11 | EXPECTED_PARAMS = ( 12 | ((0, 0, 0), (0, 0, 0)), 13 | ((22050, 100, 100), (100, 11025, 100)), 14 | ((44100, 100, 100), (100, 22050, 100)), 15 | ((44100, 100, 20), (20, 22050, 100)), 16 | ((88200, 100, 20), (20, 44100, 100)), 17 | ((22050, 100, 10), (10, 11025, 100)), 18 | ((22050, 1000, 100), (100, 11025, 1000)), 19 | ((160000, 500, 200), (200, 80000, 500)), 20 | ) 21 | 22 | 23 | def test_centre_freqs(): 24 | for args, params in EXPECTED_PARAMS: 25 | yield CentreFreqsTester(args, params) 26 | 27 | 28 | class CentreFreqsTester: 29 | 30 | def __init__(self, args, params): 31 | self.args = args 32 | self.params = params 33 | self.description = "Centre freqs for {:g} {:d} {:g}".format(*args) 34 | 35 | 36 | @patch('gammatone.filters.erb_space') 37 | def __call__(self, erb_space_mock): 38 | gammatone.filters.centre_freqs(*self.args) 39 | erb_space_mock.assert_called_with(*self.params) 40 | 41 | 42 | if __name__ == '__main__': 43 | nose.main() 44 | -------------------------------------------------------------------------------- /gammatone/tests/test_erb_space.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | import nose 7 | import numpy as np 8 | import scipy.io 9 | from pkg_resources import resource_stream 10 | 11 | import gammatone.filters 12 | 13 | REF_DATA_FILENAME = 'data/test_erbspace_data.mat' 14 | 15 | INPUT_KEY = 'erbspace_inputs' 16 | RESULT_KEY = 'erbspace_results' 17 | 18 | INPUT_COLS = ('f_low', 'f_high', 'num_f') 19 | RESULT_COLS = ('cfs',) 20 | 21 | 22 | def load_reference_data(): 23 | """ Load test data generated from the reference code """ 24 | # Load test data 25 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 26 | data = scipy.io.loadmat(test_data, squeeze_me=False) 27 | 28 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY]) 29 | 30 | for inputs, refs in zipped_data: 31 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs))) 32 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs))) 33 | yield (input_dict, ref_dict) 34 | 35 | 36 | def test_ERB_space_known_values(): 37 | for inputs, refs in load_reference_data(): 38 | args = ( 39 | inputs['f_low'], 40 | inputs['f_high'], 41 | inputs['num_f'], 42 | ) 43 | 44 | expected = (refs['cfs'],) 45 | 46 | yield ERBSpaceTester(args, expected) 47 | 48 | 49 | class ERBSpaceTester: 50 | 51 | def __init__(self, args, expected): 52 | self.args = args 53 | self.expected = expected[0] 54 | self.description = ( 55 | "ERB space for {:.1f} {:.1f} {:d}".format( 56 | float(self.args[0]), 57 | float(self.args[1]), 58 | int(self.args[2]), 59 | ) 60 | ) 61 | 62 | def __call__(self): 63 | result = gammatone.filters.erb_space(*self.args) 64 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-10) 65 | 66 | if __name__ == '__main__': 67 | nose.main() 68 | -------------------------------------------------------------------------------- /gammatone/tests/test_fft_gtgram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | from mock import patch 7 | import nose 8 | import numpy as np 9 | import scipy.io 10 | from pkg_resources import resource_stream 11 | 12 | import gammatone.fftweight 13 | 14 | REF_DATA_FILENAME = 'data/test_fft_gammatonegram_data.mat' 15 | 16 | INPUT_KEY = 'fft_gammatonegram_inputs' 17 | MOCK_KEY = 'fft_gammatonegram_mocks' 18 | RESULT_KEY = 'fft_gammatonegram_results' 19 | 20 | INPUT_COLS = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin') 21 | MOCK_COLS = ('wts',) 22 | RESULT_COLS = ('res', 'window', 'nfft', 'nwin', 'nhop') 23 | 24 | 25 | def load_reference_data(): 26 | """ Load test data generated from the reference code """ 27 | # Load test data 28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 29 | data = scipy.io.loadmat(test_data, squeeze_me=False) 30 | 31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY]) 32 | for inputs, mocks, refs in zipped_data: 33 | input_dict = dict(zip(INPUT_COLS, inputs)) 34 | mock_dict = dict(zip(MOCK_COLS, mocks)) 35 | ref_dict = dict(zip(RESULT_COLS, refs)) 36 | 37 | yield (input_dict, mock_dict, ref_dict) 38 | 39 | 40 | def test_fft_specgram_window(): 41 | for inputs, mocks, refs in load_reference_data(): 42 | args = ( 43 | refs['nfft'], 44 | refs['nwin'], 45 | ) 46 | 47 | expected = ( 48 | refs['window'], 49 | ) 50 | 51 | yield FFTGtgramWindowTester(inputs['name'], args, expected) 52 | 53 | class FFTGtgramWindowTester: 54 | 55 | def __init__(self, name, args, expected): 56 | self.nfft = args[0].squeeze() 57 | self.nwin = args[1].squeeze() 58 | self.expected = expected[0].squeeze() 59 | 60 | self.description = ( 61 | "FFT gammatonegram window for nfft = {:f}, nwin = {:f}".format( 62 | float(self.nfft), float(self.nwin) 63 | )) 64 | 65 | def __call__(self): 66 | result = gammatone.fftweight.specgram_window(self.nfft, self.nwin) 67 | max_diff = np.max(np.abs(result - self.expected)) 68 | diagnostic = "Maximum difference: {:6e}".format(max_diff) 69 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic 70 | 71 | 72 | def test_fft_gtgram(): 73 | for inputs, mocks, refs in load_reference_data(): 74 | args = ( 75 | inputs['fs'], 76 | inputs['twin'], 77 | inputs['thop'], 78 | inputs['channels'], 79 | inputs['fmin'] 80 | ) 81 | 82 | yield FFTGammatonegramTester( 83 | inputs['name'][0], 84 | args, 85 | inputs['wave'], 86 | mocks['wts'], 87 | refs['window'], 88 | refs['res'] 89 | ) 90 | 91 | class FFTGammatonegramTester: 92 | """ Testing class for gammatonegram calculation """ 93 | 94 | def __init__(self, name, args, sig, fft_weights, window, expected): 95 | self.signal = np.asarray(sig).squeeze() 96 | self.expected = np.asarray(expected).squeeze() 97 | self.fft_weights = np.asarray(fft_weights) 98 | self.args = args 99 | self.window = window.squeeze() 100 | 101 | self.description = "FFT gammatonegram for {:s}".format(name) 102 | 103 | def __call__(self): 104 | # Note that the second return value from fft_weights isn't actually used 105 | with patch( 106 | 'gammatone.fftweight.fft_weights', 107 | return_value=(self.fft_weights, None)), \ 108 | patch( 109 | 'gammatone.fftweight.specgram_window', 110 | return_value=self.window): 111 | 112 | result = gammatone.fftweight.fft_gtgram(self.signal, *self.args) 113 | 114 | max_diff = np.max(np.abs(result - self.expected)) 115 | diagnostic = "Maximum difference: {:6e}".format(max_diff) 116 | 117 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic 118 | 119 | if __name__ == '__main__': 120 | nose.main() 121 | -------------------------------------------------------------------------------- /gammatone/tests/test_fft_weights.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | from __future__ import division 7 | import nose 8 | import numpy as np 9 | import scipy.io 10 | from pkg_resources import resource_stream 11 | 12 | import gammatone.fftweight 13 | 14 | REF_DATA_FILENAME = 'data/test_fft2gtmx_data.mat' 15 | 16 | INPUT_KEY = 'fft2gtmx_inputs' 17 | RESULT_KEY = 'fft2gtmx_results' 18 | 19 | INPUT_COLS = ('nfft', 'sr', 'nfilts', 'width', 'fmin', 'fmax', 'maxlen') 20 | RESULT_COLS = ('weights', 'gain',) 21 | 22 | def load_reference_data(): 23 | """ Load test data generated from the reference code """ 24 | # Load test data 25 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 26 | data = scipy.io.loadmat(test_data, squeeze_me=False) 27 | 28 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY]) 29 | 30 | for inputs, refs in zipped_data: 31 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs))) 32 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs))) 33 | yield (input_dict, ref_dict) 34 | 35 | 36 | def fft_weights_funcs(args, expected): 37 | """ 38 | Construct a pair of unit tests for the gains and weights of the FFT to 39 | gammatonegram calculation. Returns two functions: test_gains, test_weights. 40 | """ 41 | args = list(args) 42 | expected_weights = expected[0] 43 | expected_gains = expected[1] 44 | 45 | # Convert nfft, nfilts, maxlen to ints 46 | args[0] = int(args[0]) 47 | args[2] = int(args[2]) 48 | args[6] = int(args[6]) 49 | 50 | weights, gains = gammatone.fftweight.fft_weights(*args) 51 | 52 | (test_weights_desc, test_gains_desc) = ( 53 | "FFT weights {:s} for nfft = {:d}, fs = {:d}, nfilts = {:d}".format( 54 | label, 55 | int(args[0]), 56 | int(args[1]), 57 | int(args[2]), 58 | ) for label in ("weights", "gains")) 59 | 60 | def test_gains(): 61 | assert gains.shape == expected_gains.shape 62 | assert np.allclose(gains, expected_gains, rtol=1e-6, atol=1e-12) 63 | 64 | def test_weights(): 65 | assert weights.shape == expected_weights.shape 66 | assert np.allclose(weights, expected_weights, rtol=1e-6, atol=1e-12) 67 | 68 | test_gains.description = test_gains_desc 69 | test_weights.description = test_weights_desc 70 | 71 | return test_gains, test_weights 72 | 73 | 74 | def test_fft_weights(): 75 | for inputs, refs in load_reference_data(): 76 | args = tuple(inputs[col] for col in INPUT_COLS) 77 | expected = (refs['weights'], refs['gain']) 78 | test_gains, test_weights = fft_weights_funcs(args, expected) 79 | yield test_gains 80 | yield test_weights 81 | 82 | 83 | if __name__ == '__main__': 84 | nose.main() 85 | -------------------------------------------------------------------------------- /gammatone/tests/test_filterbank.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | import nose 7 | import numpy as np 8 | import scipy.io 9 | from pkg_resources import resource_stream 10 | 11 | import gammatone.filters 12 | 13 | REF_DATA_FILENAME = 'data/test_filterbank_data.mat' 14 | 15 | INPUT_KEY = 'erb_filterbank_inputs' 16 | RESULT_KEY = 'erb_filterbank_results' 17 | 18 | INPUT_COLS = ('fcoefs', 'wave') 19 | RESULT_COLS = ('filterbank',) 20 | 21 | def load_reference_data(): 22 | """ Load test data generated from the reference code """ 23 | # Load test data 24 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 25 | data = scipy.io.loadmat(test_data, squeeze_me=False) 26 | 27 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY]) 28 | 29 | for inputs, refs in zipped_data: 30 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs))) 31 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs))) 32 | yield (input_dict, ref_dict) 33 | 34 | 35 | def test_ERB_filterbank_known_values(): 36 | for inputs, refs in load_reference_data(): 37 | args = ( 38 | inputs['wave'], 39 | inputs['fcoefs'], 40 | ) 41 | 42 | expected = (refs['filterbank'],) 43 | 44 | yield ERBFilterBankTester(args, expected) 45 | 46 | 47 | class ERBFilterBankTester: 48 | 49 | def __init__(self, args, expected): 50 | self.signal = args[0] 51 | self.fcoefs = args[1] 52 | self.expected = expected[0] 53 | 54 | self.description = ( 55 | "Gammatone filterbank result for {:.1f} ... {:.1f}".format( 56 | self.fcoefs[0][0], 57 | self.fcoefs[0][1] 58 | )) 59 | 60 | def __call__(self): 61 | result = gammatone.filters.erb_filterbank(self.signal, self.fcoefs) 62 | assert np.allclose(result, self.expected, rtol=1e-5, atol=1e-12) 63 | 64 | 65 | if __name__ == '__main__': 66 | nose.main() 67 | -------------------------------------------------------------------------------- /gammatone/tests/test_gammatone_filters.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | import nose 7 | import numpy as np 8 | import scipy.io 9 | from pkg_resources import resource_stream 10 | 11 | import gammatone.filters 12 | 13 | REF_DATA_FILENAME = 'data/test_erb_filter_data.mat' 14 | 15 | INPUT_KEY = 'erb_filter_inputs' 16 | RESULT_KEY = 'erb_filter_results' 17 | 18 | INPUT_COLS = ('fs', 'cfs') 19 | RESULT_COLS = ('fcoefs',) 20 | 21 | def load_reference_data(): 22 | """ Load test data generated from the reference code """ 23 | # Load test data 24 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 25 | data = scipy.io.loadmat(test_data, squeeze_me=False) 26 | 27 | zipped_data = zip(data[INPUT_KEY], data[RESULT_KEY]) 28 | 29 | for inputs, refs in zipped_data: 30 | input_dict = dict(zip(INPUT_COLS, map(np.squeeze, inputs))) 31 | ref_dict = dict(zip(RESULT_COLS, map(np.squeeze, refs))) 32 | yield (input_dict, ref_dict) 33 | 34 | 35 | def test_make_ERB_filters_known_values(): 36 | for inputs, refs in load_reference_data(): 37 | args = ( 38 | inputs['fs'], 39 | inputs['cfs'], 40 | ) 41 | 42 | expected = (refs['fcoefs'],) 43 | 44 | yield MakeERBFiltersTester(args, expected) 45 | 46 | 47 | class MakeERBFiltersTester: 48 | 49 | def __init__(self, args, expected): 50 | self.fs = args[0] 51 | self.cfs = args[1] 52 | self.expected = expected[0] 53 | self.description = ( 54 | "Gammatone filters for {:f}, {:.1f} ... {:.1f}".format( 55 | float(self.fs), 56 | float(self.cfs[0]), 57 | float(self.cfs[-1]) 58 | )) 59 | 60 | def __call__(self): 61 | result = gammatone.filters.make_erb_filters(self.fs, self.cfs) 62 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12) 63 | 64 | if __name__ == '__main__': 65 | nose.main() 66 | -------------------------------------------------------------------------------- /gammatone/tests/test_gammatonegram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | from mock import patch 7 | import nose 8 | import numpy as np 9 | import scipy.io 10 | from pkg_resources import resource_stream 11 | 12 | import gammatone.gtgram 13 | 14 | REF_DATA_FILENAME = 'data/test_gammatonegram_data.mat' 15 | 16 | INPUT_KEY = 'gammatonegram_inputs' 17 | MOCK_KEY = 'gammatonegram_mocks' 18 | RESULT_KEY = 'gammatonegram_results' 19 | 20 | INPUT_COLS = ('name', 'wave', 'fs', 'twin', 'thop', 'channels', 'fmin') 21 | MOCK_COLS = ('erb_fb', 'erb_fb_cols') 22 | RESULT_COLS = ('gtgram', 'nwin', 'hopsamps', 'ncols') 23 | 24 | 25 | def load_reference_data(): 26 | """ Load test data generated from the reference code """ 27 | # Load test data 28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 29 | data = scipy.io.loadmat(test_data, squeeze_me=True) 30 | 31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY]) 32 | for inputs, mocks, refs in zipped_data: 33 | input_dict = dict(zip(INPUT_COLS, inputs)) 34 | mock_dict = dict(zip(MOCK_COLS, mocks)) 35 | ref_dict = dict(zip(RESULT_COLS, refs)) 36 | yield (input_dict, mock_dict, ref_dict) 37 | 38 | 39 | def test_nstrides(): 40 | """ Test gamamtonegram stride calculations """ 41 | for inputs, mocks, refs in load_reference_data(): 42 | args = ( 43 | inputs['fs'], 44 | inputs['twin'], 45 | inputs['thop'], 46 | mocks['erb_fb_cols'] 47 | ) 48 | 49 | expected = ( 50 | refs['nwin'], 51 | refs['hopsamps'], 52 | refs['ncols'] 53 | ) 54 | 55 | yield GTGramStrideTester(inputs['name'], args, expected) 56 | 57 | 58 | class GTGramStrideTester: 59 | """ Testing class for gammatonegram stride calculation """ 60 | 61 | def __init__(self, name, inputs, expected): 62 | self.inputs = inputs 63 | self.expected = expected 64 | self.description = "Gammatonegram strides for {:s}".format(name) 65 | 66 | def __call__(self): 67 | results = gammatone.gtgram.gtgram_strides(*self.inputs) 68 | 69 | diagnostic = ( 70 | "result: {:s}, expected: {:s}".format( 71 | str(results), 72 | str(self.expected) 73 | ) 74 | ) 75 | 76 | # These are integer values, so use direct equality 77 | assert results == self.expected 78 | 79 | 80 | # TODO: possibly mock out gtgram_strides 81 | 82 | def test_gtgram(): 83 | for inputs, mocks, refs in load_reference_data(): 84 | args = ( 85 | inputs['fs'], 86 | inputs['twin'], 87 | inputs['thop'], 88 | inputs['channels'], 89 | inputs['fmin'] 90 | ) 91 | 92 | yield GammatonegramTester( 93 | inputs['name'], 94 | args, 95 | inputs['wave'], 96 | mocks['erb_fb'], 97 | refs['gtgram'] 98 | ) 99 | 100 | class GammatonegramTester: 101 | """ Testing class for gammatonegram calculation """ 102 | 103 | def __init__(self, name, args, sig, erb_fb_out, expected): 104 | self.signal = np.asarray(sig) 105 | self.expected = np.asarray(expected) 106 | self.erb_fb_out = np.asarray(erb_fb_out) 107 | self.args = args 108 | 109 | self.description = "Gammatonegram for {:s}".format(name) 110 | 111 | def __call__(self): 112 | with patch( 113 | 'gammatone.gtgram.erb_filterbank', 114 | return_value=self.erb_fb_out): 115 | 116 | result = gammatone.gtgram.gtgram(self.signal, *self.args) 117 | 118 | max_diff = np.max(np.abs(result - self.expected)) 119 | diagnostic = "Maximum difference: {:6e}".format(max_diff) 120 | 121 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic 122 | 123 | if __name__ == '__main__': 124 | nose.main() 125 | -------------------------------------------------------------------------------- /gammatone/tests/test_specgram.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # Copyright 2014 Jason Heeris, jason.heeris@gmail.com 3 | # 4 | # This file is part of the gammatone toolkit, and is licensed under the 3-clause 5 | # BSD license: https://github.com/detly/gammatone/blob/master/COPYING 6 | from mock import patch 7 | import nose 8 | import numpy as np 9 | import scipy.io 10 | from pkg_resources import resource_stream 11 | 12 | import gammatone.fftweight 13 | 14 | REF_DATA_FILENAME = 'data/test_specgram_data.mat' 15 | 16 | INPUT_KEY = 'specgram_inputs' 17 | MOCK_KEY = 'specgram_mocks' 18 | RESULT_KEY = 'specgram_results' 19 | 20 | INPUT_COLS = ('name', 'wave', 'nfft', 'fs', 'nwin', 'nhop') 21 | MOCK_COLS = ('window',) 22 | RESULT_COLS = ('res',) 23 | 24 | 25 | def load_reference_data(): 26 | """ Load test data generated from the reference code """ 27 | # Load test data 28 | with resource_stream(__name__, REF_DATA_FILENAME) as test_data: 29 | data = scipy.io.loadmat(test_data, squeeze_me=False) 30 | 31 | zipped_data = zip(data[INPUT_KEY], data[MOCK_KEY], data[RESULT_KEY]) 32 | for inputs, mocks, refs in zipped_data: 33 | input_dict = dict(zip(INPUT_COLS, inputs)) 34 | mock_dict = dict(zip(MOCK_COLS, mocks)) 35 | ref_dict = dict(zip(RESULT_COLS, refs)) 36 | 37 | yield (input_dict, mock_dict, ref_dict) 38 | 39 | 40 | def test_specgram(): 41 | for inputs, mocks, refs in load_reference_data(): 42 | args = ( 43 | inputs['nfft'], 44 | inputs['fs'], 45 | inputs['nwin'], 46 | inputs['nhop'], 47 | ) 48 | 49 | yield SpecgramTester( 50 | inputs['name'][0], 51 | args, 52 | inputs['wave'], 53 | mocks['window'], 54 | refs['res'] 55 | ) 56 | 57 | class SpecgramTester: 58 | """ Testing class for specgram replacement calculation """ 59 | 60 | def __init__(self, name, args, sig, window, expected): 61 | self.signal = np.asarray(sig).squeeze() 62 | self.expected = np.asarray(expected).squeeze() 63 | self.args = [int(a.squeeze()) for a in args] 64 | self.window = window.squeeze() 65 | self.description = "Specgram for {:s}".format(name) 66 | 67 | 68 | def __call__(self): 69 | with patch( 70 | 'gammatone.fftweight.specgram_window', 71 | return_value=self.window): 72 | result = gammatone.fftweight.specgram(self.signal, *self.args) 73 | 74 | max_diff = np.max(np.abs(result - self.expected)) 75 | diagnostic = "Maximum difference: {:6e}".format(max_diff) 76 | 77 | assert np.allclose(result, self.expected, rtol=1e-6, atol=1e-12), diagnostic 78 | 79 | if __name__ == '__main__': 80 | nose.main() 81 | -------------------------------------------------------------------------------- /images/CRNN_SELDT_DCASE2020.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/CRNN_SELDT_DCASE2020.png -------------------------------------------------------------------------------- /images/SELDnet_output.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/SELDnet_output.jpg -------------------------------------------------------------------------------- /images/scse_cropped.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/scse_cropped.pdf -------------------------------------------------------------------------------- /images/seld-squeeze-structure.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld-squeeze-structure.pdf -------------------------------------------------------------------------------- /images/seld_squeeze_structure_image.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/janaal1/DCASE2020-Task3/bc7d5e87faa2fbe014dc47cccb48d9927b4ed3ec/images/seld_squeeze_structure_image.jpg -------------------------------------------------------------------------------- /keras_model.py: -------------------------------------------------------------------------------- 1 | # 2 | # The SELDnet architecture 3 | # 4 | 5 | from keras.layers import (Bidirectional, Conv2D, MaxPooling2D, Input, Concatenate, 6 | Dense, Activation, Dropout, Reshape, Permute, 7 | GlobalAveragePooling2D, add, Activation, Input, Flatten, Lambda, 8 | GlobalAveragePooling1D, Reshape, ELU, multiply) 9 | #from keras.layers.core import Dense, Activation, Dropout, Reshape, Permute 10 | from keras.layers.recurrent import GRU 11 | from keras.layers.normalization import BatchNormalization 12 | from keras.models import Model 13 | from keras.layers.wrappers import TimeDistributed 14 | from keras.optimizers import Adam 15 | from keras.models import load_model 16 | import keras 17 | keras.backend.set_image_data_format('channels_first') 18 | from IPython import embed 19 | import numpy as np 20 | 21 | import keras.backend as K 22 | import warnings # added 23 | 24 | # From https://github.com/keras-team/keras-applications/blob/e52c477/keras_applications/imagenet_utils.py#L235-L331 25 | def _obtain_input_shape(input_shape, 26 | default_size, 27 | min_size, 28 | data_format, 29 | require_flatten, 30 | weights=None): 31 | """Internal utility to compute/validate a model's tensor shape. 32 | # Arguments 33 | input_shape: Either None (will return the default network input shape), 34 | or a user-provided shape to be validated. 35 | default_size: Default input width/height for the model. 36 | min_size: Minimum input width/height accepted by the model. 37 | data_format: Image data format to use. 38 | require_flatten: Whether the model is expected to 39 | be linked to a classifier via a Flatten layer. 40 | weights: One of `None` (random initialization) 41 | or 'imagenet' (pre-training on ImageNet). 42 | If weights='imagenet' input channels must be equal to 3. 43 | # Returns 44 | An integer shape tuple (may include None entries). 45 | # Raises 46 | ValueError: In case of invalid argument values. 47 | """ 48 | if weights != 'imagenet' and input_shape and len(input_shape) == 3: 49 | if data_format == 'channels_first': 50 | if input_shape[0] not in {1, 3}: 51 | warnings.warn( 52 | 'This model usually expects 1 or 3 input channels. ' 53 | 'However, it was passed an input_shape with {input_shape}' 54 | ' input channels.'.format(input_shape=input_shape[0])) 55 | default_shape = (input_shape[0], default_size, default_size) 56 | else: 57 | if input_shape[-1] not in {1, 3}: 58 | warnings.warn( 59 | 'This model usually expects 1 or 3 input channels. ' 60 | 'However, it was passed an input_shape with {n_input_channels}' 61 | ' input channels.'.format(n_input_channels=input_shape[-1])) 62 | default_shape = (default_size, default_size, input_shape[-1]) 63 | else: 64 | if data_format == 'channels_first': 65 | default_shape = (3, default_size, default_size) 66 | else: 67 | default_shape = (default_size, default_size, 3) 68 | if weights == 'imagenet' and require_flatten: 69 | if input_shape is not None: 70 | if input_shape != default_shape: 71 | raise ValueError('When setting `include_top=True` ' 72 | 'and loading `imagenet` weights, ' 73 | '`input_shape` should be {default_shape}.'.format(default_shape=default_shape)) 74 | return default_shape 75 | if input_shape: 76 | if data_format == 'channels_first': 77 | if input_shape is not None: 78 | if len(input_shape) != 3: 79 | raise ValueError( 80 | '`input_shape` must be a tuple of three integers.') 81 | if input_shape[0] != 3 and weights == 'imagenet': 82 | raise ValueError('The input must have 3 channels; got ' 83 | '`input_shape={input_shape}`'.format(input_shape=input_shape)) 84 | if ((input_shape[1] is not None and input_shape[1] < min_size) or 85 | (input_shape[2] is not None and input_shape[2] < min_size)): 86 | raise ValueError('Input size must be at least {min_size}x{min_size};' 87 | ' got `input_shape={input_shape}`'.format(min_size=min_size, 88 | input_shape=input_shape)) 89 | else: 90 | if input_shape is not None: 91 | if len(input_shape) != 3: 92 | raise ValueError( 93 | '`input_shape` must be a tuple of three integers.') 94 | if input_shape[-1] != 3 and weights == 'imagenet': 95 | raise ValueError('The input must have 3 channels; got ' 96 | '`input_shape={input_shape}`'.format(input_shape=input_shape)) 97 | if ((input_shape[0] is not None and input_shape[0] < min_size) or 98 | (input_shape[1] is not None and input_shape[1] < min_size)): 99 | raise ValueError('Input size must be at least {min_size}x{min_size};' 100 | ' got `input_shape={input_shape}`'.format(min_size=min_size, 101 | input_shape=input_shape)) 102 | else: 103 | if require_flatten: 104 | input_shape = default_shape 105 | else: 106 | if data_format == 'channels_first': 107 | input_shape = (3, None, None) 108 | else: 109 | input_shape = (None, None, 3) 110 | if require_flatten: 111 | if None in input_shape: 112 | raise ValueError('If `include_top` is True, ' 113 | 'you should specify a static `input_shape`. ' 114 | 'Got `input_shape={input_shape}`'.format(input_shape=input_shape)) 115 | return input_shape 116 | 117 | 118 | def squeeze_excite_block(input_tensor, ratio=16): 119 | """ Create a channel-wise squeeze-excite block 120 | Args: 121 | input_tensor: input Keras tensor 122 | ratio: number of output filters 123 | Returns: a Keras tensor 124 | References 125 | - [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507) 126 | """ 127 | init = input_tensor 128 | channel_axis = 1 if K.image_data_format() == "channels_first" else -1 129 | filters = _tensor_shape(init)[channel_axis] 130 | se_shape = (1, 1, filters) 131 | 132 | se = GlobalAveragePooling2D()(init) 133 | se = Reshape(se_shape)(se) 134 | se = Dense(filters // ratio, activation='relu', kernel_initializer='he_normal', use_bias=False)(se) 135 | se = Dense(filters, activation='sigmoid', kernel_initializer='he_normal', use_bias=False)(se) 136 | 137 | if K.image_data_format() == 'channels_first': 138 | se = Permute((3, 1, 2))(se) 139 | 140 | x = multiply([init, se]) 141 | return x 142 | 143 | 144 | def spatial_squeeze_excite_block(input_tensor): 145 | """ Create a spatial squeeze-excite block 146 | Args: 147 | input_tensor: input Keras tensor 148 | Returns: a Keras tensor 149 | References 150 | - [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579) 151 | """ 152 | 153 | se = Conv2D(1, (1, 1), activation='sigmoid', use_bias=False, 154 | kernel_initializer='he_normal')(input_tensor) 155 | 156 | x = multiply([input_tensor, se]) 157 | return x 158 | 159 | 160 | def channel_spatial_squeeze_excite(input_tensor, ratio=16): 161 | """ Create a spatial squeeze-excite block 162 | Args: 163 | input_tensor: input Keras tensor 164 | ratio: number of output filters 165 | Returns: a Keras tensor 166 | References 167 | - [Squeeze and Excitation Networks](https://arxiv.org/abs/1709.01507) 168 | - [Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks](https://arxiv.org/abs/1803.02579) 169 | """ 170 | 171 | cse = squeeze_excite_block(input_tensor, ratio) 172 | sse = spatial_squeeze_excite_block(input_tensor) 173 | 174 | x = add([cse, sse]) 175 | return x 176 | 177 | def _tensor_shape(tensor): 178 | return getattr(tensor, '_keras_shape') 179 | 180 | def get_model(data_in, data_out, dropout_rate, nb_cnn2d_filt, f_pool_size, t_pool_size, 181 | rnn_size, fnn_size, weights, doa_objective, baseline, ratio): 182 | # model definition 183 | spec_start = Input(shape=(data_in[-3], data_in[-2], data_in[-1])) 184 | 185 | # CNN 186 | spec_cnn = spec_start 187 | for i, convCnt in enumerate(f_pool_size): 188 | 189 | if baseline is False: 190 | 191 | spec_aux = spec_cnn 192 | spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn) 193 | spec_cnn = BatchNormalization()(spec_cnn) 194 | spec_cnn = ELU()(spec_cnn) 195 | spec_cnn = Conv2D(nb_cnn2d_filt, 3, padding='same')(spec_cnn) 196 | spec_cnn = BatchNormalization()(spec_cnn) 197 | 198 | spec_aux = Conv2D(nb_cnn2d_filt, 1, padding='same')(spec_aux) 199 | spec_aux = BatchNormalization()(spec_aux) 200 | 201 | spec_cnn = add([spec_cnn,spec_aux]) 202 | spec_cnn = ELU()(spec_cnn) 203 | 204 | if ratio != 0: 205 | 206 | spec_cnn = channel_spatial_squeeze_excite(spec_cnn,ratio=ratio) 207 | 208 | spec_cnn = add([spec_cnn, spec_aux]) 209 | 210 | else: 211 | 212 | spec_cnn = Conv2D(filters=nb_cnn2d_filt, kernel_size=(3, 3), padding='same')(spec_cnn) 213 | spec_cnn = BatchNormalization()(spec_cnn) 214 | spec_cnn = Activation('relu')(spec_cnn) 215 | spec_cnn = MaxPooling2D(pool_size=(t_pool_size[i], f_pool_size[i]))(spec_cnn) 216 | spec_cnn = Dropout(dropout_rate)(spec_cnn) 217 | spec_cnn = Permute((2, 1, 3))(spec_cnn) 218 | 219 | # RNN 220 | spec_rnn = Reshape((data_out[0][-2], -1))(spec_cnn) 221 | for nb_rnn_filt in rnn_size: 222 | spec_rnn = Bidirectional( 223 | GRU(nb_rnn_filt, activation='tanh', dropout=dropout_rate, recurrent_dropout=dropout_rate, 224 | return_sequences=True), 225 | merge_mode='mul' 226 | )(spec_rnn) 227 | 228 | # FC - DOA 229 | doa = spec_rnn 230 | for nb_fnn_filt in fnn_size: 231 | doa = TimeDistributed(Dense(nb_fnn_filt))(doa) 232 | doa = Dropout(dropout_rate)(doa) 233 | 234 | doa = TimeDistributed(Dense(data_out[1][-1]))(doa) 235 | doa = Activation('tanh', name='doa_out')(doa) 236 | 237 | # FC - SED 238 | sed = spec_rnn 239 | for nb_fnn_filt in fnn_size: 240 | sed = TimeDistributed(Dense(nb_fnn_filt))(sed) 241 | sed = Dropout(dropout_rate)(sed) 242 | sed = TimeDistributed(Dense(data_out[0][-1]))(sed) 243 | sed = Activation('sigmoid', name='sed_out')(sed) 244 | 245 | model = None 246 | if doa_objective is 'mse': 247 | model = Model(inputs=spec_start, outputs=[sed, doa]) 248 | model.compile(optimizer=Adam(), loss=['binary_crossentropy', 'mse'], loss_weights=weights) 249 | elif doa_objective is 'masked_mse': 250 | doa_concat = Concatenate(axis=-1, name='doa_concat')([sed, doa]) 251 | model = Model(inputs=spec_start, outputs=[sed, doa_concat]) 252 | model.compile(optimizer=Adam(), loss=['binary_crossentropy', masked_mse], loss_weights=weights) 253 | else: 254 | print('ERROR: Unknown doa_objective: {}'.format(doa_objective)) 255 | exit() 256 | model.summary() 257 | return model 258 | 259 | 260 | def masked_mse(y_gt, model_out): 261 | # SED mask: Use only the predicted DOAs when gt SED > 0.5 262 | sed_out = y_gt[:, :, :14] >= 0.5 #TODO fix this hardcoded value of number of classes 263 | sed_out = keras.backend.repeat_elements(sed_out, 3, -1) 264 | sed_out = keras.backend.cast(sed_out, 'float32') 265 | 266 | # Use the mask to computed mse now. Normalize with the mask weights #TODO fix this hardcoded value of number of classes 267 | return keras.backend.sqrt(keras.backend.sum(keras.backend.square(y_gt[:, :, 14:] - model_out[:, :, 14:]) * sed_out))/keras.backend.sum(sed_out) 268 | 269 | 270 | def load_seld_model(model_file, doa_objective): 271 | if doa_objective is 'mse': 272 | return load_model(model_file) 273 | elif doa_objective is 'masked_mse': 274 | return load_model(model_file, custom_objects={'masked_mse': masked_mse}) 275 | else: 276 | print('ERROR: Unknown doa objective: {}'.format(doa_objective)) 277 | exit() 278 | 279 | 280 | 281 | -------------------------------------------------------------------------------- /metrics/LICENSE.md: -------------------------------------------------------------------------------- 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------ 2 | Copyright (c) 2020 Tampere University and its licensors 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy 5 | of this script, SELD_evaluation_metrics.py (the "Software"), to deal 6 | in the Software without restriction, including without limitation the rights 7 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the Software is 9 | furnished to do so, subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 16 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 17 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 18 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 19 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 20 | SOFTWARE. 21 | 22 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------ 23 | -------------------------------------------------------------------------------- /metrics/SELD_evaluation_metrics.py: -------------------------------------------------------------------------------- 1 | # 2 | # Implements the localization and detection metrics proposed in the paper 3 | # 4 | # Joint Measurement of Localization and Detection of Sound Events 5 | # Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, Tuomas Virtanen 6 | # WASPAA 2019 7 | # 8 | # 9 | # This script has MIT license 10 | # 11 | 12 | import numpy as np 13 | from IPython import embed 14 | eps = np.finfo(np.float).eps 15 | from scipy.optimize import linear_sum_assignment 16 | 17 | 18 | class SELDMetrics(object): 19 | def __init__(self, doa_threshold=20, nb_classes=11): 20 | ''' 21 | This class implements both the class-sensitive localization and location-sensitive detection metrics. 22 | Additionally, based on the user input, the corresponding averaging is performed within the segment. 23 | 24 | :param nb_classes: Number of sound classes. In the paper, nb_classes = 11 25 | :param doa_thresh: DOA threshold for location sensitive detection. 26 | ''' 27 | 28 | self._TP = 0 29 | self._FP = 0 30 | self._TN = 0 31 | self._FN = 0 32 | 33 | self._S = 0 34 | self._D = 0 35 | self._I = 0 36 | 37 | self._Nref = 0 38 | self._Nsys = 0 39 | 40 | self._total_DE = 0 41 | self._DE_TP = 0 42 | 43 | self._spatial_T = doa_threshold 44 | self._nb_classes = nb_classes 45 | 46 | def compute_seld_scores(self): 47 | ''' 48 | Collect the final SELD scores 49 | 50 | :return: returns both location-sensitive detection scores and class-sensitive localization scores 51 | ''' 52 | 53 | # Location-senstive detection performance 54 | ER = (self._S + self._D + self._I) / float(self._Nref + eps) 55 | 56 | prec = float(self._TP) / float(self._Nsys + eps) 57 | recall = float(self._TP) / float(self._Nref + eps) 58 | F = 2 * prec * recall / (prec + recall + eps) 59 | 60 | # Class-sensitive localization performance 61 | if self._DE_TP: 62 | DE = self._total_DE / float(self._DE_TP + eps) 63 | else: 64 | # When the total number of prediction is zero 65 | DE = 180 66 | 67 | DE_prec = float(self._DE_TP) / float(self._Nsys + eps) 68 | DE_recall = float(self._DE_TP) / float(self._Nref + eps) 69 | DE_F = 2 * DE_prec * DE_recall / (DE_prec + DE_recall + eps) 70 | 71 | return ER, F, DE, DE_F 72 | 73 | def update_seld_scores_xyz(self, pred, gt): 74 | ''' 75 | Implements the spatial error averaging according to equation [5] in the paper, using Cartesian distance 76 | 77 | :param pred: dictionary containing class-wise prediction results for each N-seconds segment block 78 | :param gt: dictionary containing class-wise groundtruth for each N-seconds segment block 79 | ''' 80 | for block_cnt in range(len(gt.keys())): 81 | # print('\nblock_cnt', block_cnt, end='') 82 | loc_FN, loc_FP = 0, 0 83 | for class_cnt in range(self._nb_classes): 84 | # print('\tclass:', class_cnt, end='') 85 | # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment 86 | if class_cnt in gt[block_cnt]: 87 | self._Nref += 1 88 | if class_cnt in pred[block_cnt]: 89 | self._Nsys += 1 90 | 91 | if class_cnt in gt[block_cnt] and class_cnt in pred[block_cnt]: 92 | # True positives or False negative case 93 | 94 | # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then 95 | # calculate the spatial distance using the following code. In the current code, if there are multiple 96 | # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it. 97 | 98 | total_spatial_dist = 0 99 | total_framewise_matching_doa = 0 100 | gt_ind_list = gt[block_cnt][class_cnt][0][0] 101 | pred_ind_list = pred[block_cnt][class_cnt][0][0] 102 | for gt_ind, gt_val in enumerate(gt_ind_list): 103 | if gt_val in pred_ind_list: 104 | total_framewise_matching_doa += 1 105 | pred_ind = pred_ind_list.index(gt_val) 106 | 107 | gt_arr = np.array(gt[block_cnt][class_cnt][0][1][gt_ind]) 108 | pred_arr = np.array(pred[block_cnt][class_cnt][0][1][pred_ind]) 109 | 110 | if gt_arr.shape[0]==1 and pred_arr.shape[0]==1: 111 | total_spatial_dist += distance_between_cartesian_coordinates(gt_arr[0][0], gt_arr[0][1], gt_arr[0][2], pred_arr[0][0], pred_arr[0][1], pred_arr[0][2]) 112 | else: 113 | total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr) 114 | 115 | if total_spatial_dist == 0 and total_framewise_matching_doa == 0: 116 | loc_FN += 1 117 | self._FN += 1 118 | else: 119 | avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa) 120 | 121 | self._total_DE += avg_spatial_dist 122 | self._DE_TP += 1 123 | 124 | if avg_spatial_dist <= self._spatial_T: 125 | self._TP += 1 126 | else: 127 | loc_FN += 1 128 | self._FN += 1 129 | elif class_cnt in gt[block_cnt] and class_cnt not in pred[block_cnt]: 130 | # False negative 131 | loc_FN += 1 132 | self._FN += 1 133 | elif class_cnt not in gt[block_cnt] and class_cnt in pred[block_cnt]: 134 | # False positive 135 | loc_FP += 1 136 | self._FP += 1 137 | elif class_cnt not in gt[block_cnt] and class_cnt not in pred[block_cnt]: 138 | # True negative 139 | self._TN += 1 140 | 141 | self._S += np.minimum(loc_FP, loc_FN) 142 | self._D += np.maximum(0, loc_FN - loc_FP) 143 | self._I += np.maximum(0, loc_FP - loc_FN) 144 | return 145 | 146 | def update_seld_scores(self, pred_deg, gt_deg): 147 | ''' 148 | Implements the spatial error averaging according to equation [5] in the paper, using Polar distance 149 | Expects the angles in degrees 150 | 151 | :param pred_deg: dictionary containing class-wise prediction results for each N-seconds segment block 152 | :param gt_deg: dictionary containing class-wise groundtruth for each N-seconds segment block 153 | ''' 154 | for block_cnt in range(len(gt_deg.keys())): 155 | # print('\nblock_cnt', block_cnt, end='') 156 | loc_FN, loc_FP = 0, 0 157 | for class_cnt in range(self._nb_classes): 158 | # print('\tclass:', class_cnt, end='') 159 | # Counting the number of ref and sys outputs should include the number of tracks for each class in the segment 160 | if class_cnt in gt_deg[block_cnt]: 161 | self._Nref += 1 162 | if class_cnt in pred_deg[block_cnt]: 163 | self._Nsys += 1 164 | 165 | if class_cnt in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]: 166 | # True positives or False negative case 167 | 168 | # NOTE: For multiple tracks per class, identify multiple tracks using hungarian algorithm and then 169 | # calculate the spatial distance using the following code. In the current code, if there are multiple 170 | # tracks of the same class in a frame we are calculating the least cost between the groundtruth and predicted and using it. 171 | total_spatial_dist = 0 172 | total_framewise_matching_doa = 0 173 | gt_ind_list = gt_deg[block_cnt][class_cnt][0][0] 174 | pred_ind_list = pred_deg[block_cnt][class_cnt][0][0] 175 | for gt_ind, gt_val in enumerate(gt_ind_list): 176 | if gt_val in pred_ind_list: 177 | total_framewise_matching_doa += 1 178 | pred_ind = pred_ind_list.index(gt_val) 179 | 180 | gt_arr = np.array(gt_deg[block_cnt][class_cnt][0][1][gt_ind]) * np.pi / 180 181 | pred_arr = np.array(pred_deg[block_cnt][class_cnt][0][1][pred_ind]) * np.pi / 180 182 | if gt_arr.shape[0]==1 and pred_arr.shape[0]==1: 183 | total_spatial_dist += distance_between_spherical_coordinates_rad(gt_arr[0][0], gt_arr[0][1], pred_arr[0][0], pred_arr[0][1]) 184 | else: 185 | total_spatial_dist += least_distance_between_gt_pred(gt_arr, pred_arr) 186 | 187 | if total_spatial_dist == 0 and total_framewise_matching_doa == 0: 188 | loc_FN += 1 189 | self._FN += 1 190 | else: 191 | avg_spatial_dist = (total_spatial_dist / total_framewise_matching_doa) 192 | 193 | self._total_DE += avg_spatial_dist 194 | self._DE_TP += 1 195 | 196 | if avg_spatial_dist <= self._spatial_T: 197 | self._TP += 1 198 | else: 199 | loc_FN += 1 200 | self._FN += 1 201 | elif class_cnt in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]: 202 | # False negative 203 | loc_FN += 1 204 | self._FN += 1 205 | elif class_cnt not in gt_deg[block_cnt] and class_cnt in pred_deg[block_cnt]: 206 | # False positive 207 | loc_FP += 1 208 | self._FP += 1 209 | elif class_cnt not in gt_deg[block_cnt] and class_cnt not in pred_deg[block_cnt]: 210 | # True negative 211 | self._TN += 1 212 | 213 | self._S += np.minimum(loc_FP, loc_FN) 214 | self._D += np.maximum(0, loc_FN - loc_FP) 215 | self._I += np.maximum(0, loc_FP - loc_FN) 216 | return 217 | 218 | 219 | def distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2): 220 | """ 221 | Angular distance between two spherical coordinates 222 | MORE: https://en.wikipedia.org/wiki/Great-circle_distance 223 | 224 | :return: angular distance in degrees 225 | """ 226 | dist = np.sin(ele1) * np.sin(ele2) + np.cos(ele1) * np.cos(ele2) * np.cos(np.abs(az1 - az2)) 227 | # Making sure the dist values are in -1 to 1 range, else np.arccos kills the job 228 | dist = np.clip(dist, -1, 1) 229 | dist = np.arccos(dist) * 180 / np.pi 230 | return dist 231 | 232 | 233 | def distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2): 234 | """ 235 | Angular distance between two cartesian coordinates 236 | MORE: https://en.wikipedia.org/wiki/Great-circle_distance 237 | Check 'From chord length' section 238 | 239 | :return: angular distance in degrees 240 | """ 241 | # Normalize the Cartesian vectors 242 | N1 = np.sqrt(x1**2 + y1**2 + z1**2 + 1e-10) 243 | N2 = np.sqrt(x2**2 + y2**2 + z2**2 + 1e-10) 244 | x1, y1, z1, x2, y2, z2 = x1/N1, y1/N1, z1/N1, x2/N2, y2/N2, z2/N2 245 | 246 | #Compute the distance 247 | dist = x1*x2 + y1*y2 + z1*z2 248 | dist = np.clip(dist, -1, 1) 249 | dist = np.arccos(dist) * 180 / np.pi 250 | return dist 251 | 252 | 253 | def least_distance_between_gt_pred(gt_list, pred_list): 254 | """ 255 |         Shortest distance between two sets of DOA coordinates. Given a set of groundtruth coordinates, 256 |         and its respective predicted coordinates, we calculate the distance between each of the 257 |         coordinate pairs resulting in a matrix of distances, where one axis represents the number of groundtruth 258 |         coordinates and the other the predicted coordinates. The number of estimated peaks need not be the same as in 259 |         groundtruth, thus the distance matrix is not always a square matrix. We use the hungarian algorithm to find the 260 |         least cost in this distance matrix. 261 |         :param gt_list_xyz: list of ground-truth Cartesian or Polar coordinates in Radians 262 |         :param pred_list_xyz: list of predicted Carteisan or Polar coordinates in Radians 263 |         :return: cost -  distance 264 |         :return: less - number of DOA's missed 265 |         :return: extra - number of DOA's over-estimated 266 |     """ 267 | gt_len, pred_len = gt_list.shape[0], pred_list.shape[0] 268 | ind_pairs = np.array([[x, y] for y in range(pred_len) for x in range(gt_len)]) 269 | cost_mat = np.zeros((gt_len, pred_len)) 270 | 271 | if gt_len and pred_len: 272 | if len(gt_list[0]) == 3: #Cartesian 273 | x1, y1, z1, x2, y2, z2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], gt_list[ind_pairs[:, 0], 2], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1], pred_list[ind_pairs[:, 1], 2] 274 | cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_cartesian_coordinates(x1, y1, z1, x2, y2, z2) 275 | else: 276 | az1, ele1, az2, ele2 = gt_list[ind_pairs[:, 0], 0], gt_list[ind_pairs[:, 0], 1], pred_list[ind_pairs[:, 1], 0], pred_list[ind_pairs[:, 1], 1] 277 | cost_mat[ind_pairs[:, 0], ind_pairs[:, 1]] = distance_between_spherical_coordinates_rad(az1, ele1, az2, ele2) 278 | 279 | row_ind, col_ind = linear_sum_assignment(cost_mat) 280 | cost = cost_mat[row_ind, col_ind].sum() 281 | return cost 282 | 283 | 284 | def early_stopping_metric(sed_error, doa_error): 285 | """ 286 | Compute early stopping metric from sed and doa errors. 287 | 288 | :param sed_error: [error rate (0 to 1 range), f score (0 to 1 range)] 289 | :param doa_error: [doa error (in degrees), frame recall (0 to 1 range)] 290 | :return: early stopping metric result 291 | """ 292 | seld_metric = np.mean([ 293 | sed_error[0], 294 | 1 - sed_error[1], 295 | doa_error[0]/180, 296 | 1 - doa_error[1]] 297 | ) 298 | return seld_metric 299 | -------------------------------------------------------------------------------- /parameter.py: -------------------------------------------------------------------------------- 1 | # Parameters used in the feature extraction, neural network model, and training the SELDnet can be changed here. 2 | # 3 | # Ideally, do not change the values of the default parameters. Create separate cases with unique as seen in 4 | # the code below (if-else loop) and use them. This way you can easily reproduce a configuration on a later time. 5 | 6 | 7 | def get_params(argv='1'): 8 | print("SET: {}".format(argv)) 9 | # ########### default parameters ############## 10 | params = dict( 11 | quick_test=False, # To do quick test. Trains/test on small subset of dataset, and # of epochs 12 | 13 | # INPUT PATH 14 | dataset_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\base_folder', # Base folder containing the foa/mic and metadata folders 15 | #dataset_dir='/content/gdrive/My Drive/DCASE2020-Task3/base_folder', 16 | 17 | # OUTPUT PATH 18 | feat_label_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\input_feature\\baseline_log_mel', # Directory to dump extracted features and labels 19 | #feat_label_dir='/content/gdrive/My Drive/DCASE2020-Task3/input_feature/gammatone_nomax_gcclogmel', 20 | model_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\models', # Dumps the trained models and training curves in this folder 21 | dcase_output=True, # If true, dumps the results recording-wise in 'dcase_dir' path. 22 | # Set this true after you have finalized your model, save the output, and submit 23 | dcase_dir='C:\\JAVIER\\code\\DCASE2020-Task3\\outputs\\ratio-1\\results', # Dumps the recording-wise network output in this folder 24 | 25 | # DATASET LOADING PARAMETERS 26 | mode='eval', # 'dev' - development or 'eval' - evaluation dataset 27 | dataset='mic', # 'foa' - ambisonic or 'mic' - microphone signals 28 | 29 | #FEATURE PARAMS 30 | fs=24000, 31 | hop_len_s=0.02, 32 | label_hop_len_s=0.1, 33 | max_audio_len_s=60, 34 | nb_mel_bins=64, 35 | 36 | #AUDIO REPRESENTATION TYPE (+) 37 | is_gammatone=False, # if set to True, extracts gammatone representation instead of Log-Mel 38 | fmin=.0, 39 | 40 | # DNN MODEL PARAMETERS 41 | label_sequence_length=60, # Feature sequence length 42 | batch_size=64, # Batch size 43 | dropout_rate=0, # Dropout rate, constant for all layers 44 | nb_cnn2d_filt=64, # Number of CNN nodes, constant for each layer 45 | f_pool_size=[4, 4, 2], # CNN frequency pooling, length of list = number of CNN layers, list value = pooling per layer 46 | 47 | # CNN squeeze-excitation parameter (+) 48 | do_baseline=False, 49 | ratio=16, 50 | 51 | # Get dataset 52 | folder='normalized', 53 | 54 | rnn_size=[128, 128], # RNN contents, length of list = number of layers, list value = number of nodes 55 | fnn_size=[128], # FNN contents, length of list = number of layers, list value = number of nodes 56 | loss_weights=[1., 1000.], # [sed, doa] weight for scaling the DNN outputs 57 | nb_epochs=50, # Train for maximum epochs 58 | epochs_per_fit=5, # Number of epochs per fit 59 | doa_objective='masked_mse', # supports: mse, masked_mse. mse- original seld approach; masked_mse - dcase 2020 approach 60 | 61 | #METRIC PARAMETERS 62 | lad_doa_thresh=20 63 | 64 | ) 65 | feature_label_resolution = int(params['label_hop_len_s'] // params['hop_len_s']) 66 | params['feature_sequence_length'] = params['label_sequence_length'] * feature_label_resolution 67 | params['t_pool_size'] = [feature_label_resolution, 1, 1] # CNN time pooling 68 | params['patience'] = int(params['nb_epochs']) # Stop training if patience is reached 69 | 70 | params['unique_classes'] = { 71 | 'alarm': 0, 72 | 'baby': 1, 73 | 'crash': 2, 74 | 'dog': 3, 75 | 'engine': 4, 76 | 'female_scream': 5, 77 | 'female_speech': 6, 78 | 'fire': 7, 79 | 'footsteps': 8, 80 | 'knock': 9, 81 | 'male_scream': 10, 82 | 'male_speech': 11, 83 | 'phone': 12, 84 | 'piano': 13 85 | } 86 | 87 | 88 | # ########### User defined parameters ############## 89 | # if argv == '1': 90 | # print("USING DEFAULT PARAMETERS\n") 91 | 92 | # elif argv == '2': 93 | # params['mode'] = 'dev' 94 | # params['dataset'] = 'mic' 95 | 96 | # elif argv == '3': 97 | # params['mode'] = 'eval' 98 | # params['dataset'] = 'mic' 99 | 100 | # elif argv == '4': 101 | # params['mode'] = 'dev' 102 | # params['dataset'] = 'foa' 103 | 104 | # elif argv == '5': 105 | # params['mode'] = 'eval' 106 | # params['dataset'] = 'foa' 107 | 108 | # elif argv == '999': 109 | # print("QUICK TEST MODE\n") 110 | # params['quick_test'] = True 111 | # params['epochs_per_fit'] = 1 112 | 113 | # else: 114 | # print('ERROR: unknown argument {}'.format(argv)) 115 | # exit() 116 | 117 | for key, value in params.items(): 118 | print("\t{}: {}".format(key, value)) 119 | return params 120 | -------------------------------------------------------------------------------- /visualize_SELD_output.py: -------------------------------------------------------------------------------- 1 | # Script for visualising the SELD output. 2 | # 3 | # NOTE: Make sure to use the appropriate backend for the matplotlib based on your OS 4 | 5 | import os 6 | import numpy as np 7 | import librosa.display 8 | import cls_feature_class 9 | import parameter 10 | import matplotlib.gridspec as gridspec 11 | import matplotlib.pyplot as plot 12 | plot.switch_backend('agg') 13 | plot.rcParams.update({'font.size': 22}) 14 | 15 | 16 | def collect_classwise_data(_in_dict): 17 | _out_dict = {} 18 | for _key in _in_dict.keys(): 19 | for _seld in _in_dict[_key]: 20 | if _seld[0] not in _out_dict: 21 | _out_dict[_seld[0]] = [] 22 | _out_dict[_seld[0]].append([_key, _seld[0], _seld[1], _seld[2]]) 23 | return _out_dict 24 | 25 | 26 | def plot_func(plot_data, hop_len_s, ind, plot_x_ax=False, plot_y_ax=False): 27 | cmap = ['b', 'r', 'g', 'y', 'k', 'c', 'm', 'b', 'r', 'g', 'y', 'k', 'c', 'm'] 28 | for class_ind in plot_data.keys(): 29 | time_ax = np.array(plot_data[class_ind])[:, 0] *hop_len_s 30 | y_ax = np.array(plot_data[class_ind])[:, ind] 31 | plot.plot(time_ax, y_ax, marker='.', color=cmap[class_ind], linestyle='None', markersize=4) 32 | plot.grid() 33 | plot.xlim([0, 60]) 34 | if not plot_x_ax: 35 | plot.gca().axes.set_xticklabels([]) 36 | 37 | if not plot_y_ax: 38 | plot.gca().axes.set_yticklabels([]) 39 | # --------------------------------- MAIN SCRIPT STARTS HERE ----------------------------------------- 40 | params = parameter.get_params() 41 | 42 | # output format file to visualize 43 | pred = os.path.join(params['dcase_dir'], '2_mic_dev/fold1_room1_mix006_ov1.csv') 44 | 45 | # path of reference audio directory for visualizing the spectrogram and description directory for 46 | # visualizing the reference 47 | # Note: The code finds out the audio filename from the predicted filename automatically 48 | ref_dir = os.path.join(params['dataset_dir'], 'metadata_dev') 49 | aud_dir = os.path.join(params['dataset_dir'], 'mic_dev') 50 | 51 | # load the predicted output format 52 | feat_cls = cls_feature_class.FeatureClass(params) 53 | pred_dict = feat_cls.load_output_format_file(pred) 54 | pred_dict_polar = feat_cls.convert_output_format_cartesian_to_polar(pred_dict) 55 | 56 | # load the reference output format 57 | ref_filename = os.path.basename(pred) 58 | ref_dict_polar = feat_cls.load_output_format_file(os.path.join(ref_dir, ref_filename)) 59 | 60 | pred_data = collect_classwise_data(pred_dict_polar) 61 | ref_data = collect_classwise_data(ref_dict_polar) 62 | 63 | nb_classes = len(feat_cls.get_classes()) 64 | 65 | # load the audio and extract spectrogram 66 | ref_filename = os.path.basename(pred).replace('.csv', '.wav') 67 | audio, fs = feat_cls._load_audio(os.path.join(aud_dir, ref_filename)) 68 | stft = np.abs(np.squeeze(feat_cls._spectrogram(audio[:, :1]))) 69 | stft = librosa.amplitude_to_db(stft, ref=np.max) 70 | 71 | plot.figure(figsize=(20, 15)) 72 | gs = gridspec.GridSpec(4, 4) 73 | ax0 = plot.subplot(gs[0, 1:3]), librosa.display.specshow(stft.T, sr=fs, x_axis='s', y_axis='linear'), plot.xlim([0, 60]), plot.xticks([]), plot.xlabel(''), plot.title('Spectrogram') 74 | ax1 = plot.subplot(gs[1, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=1, plot_y_ax=True), plot.ylim([-1, nb_classes + 1]), plot.title('SED reference') 75 | ax2 = plot.subplot(gs[1, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=1), plot.ylim([-1, nb_classes + 1]), plot.title('SED predicted') 76 | ax3 = plot.subplot(gs[2, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=2, plot_y_ax=True), plot.ylim([-180, 180]), plot.title('Azimuth reference') 77 | ax4 = plot.subplot(gs[2, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=2), plot.ylim([-180, 180]), plot.title('Azimuth predicted') 78 | ax5 = plot.subplot(gs[3, :2]), plot_func(ref_data, params['label_hop_len_s'], ind=3, plot_y_ax=True), plot.ylim([-90, 90]), plot.title('Elevation reference') 79 | ax6 = plot.subplot(gs[3, 2:]), plot_func(pred_data, params['label_hop_len_s'], ind=3), plot.ylim([-90, 90]), plot.title('Elevation predicted') 80 | ax_lst = [ax0, ax1, ax2, ax3, ax4, ax5, ax6] 81 | plot.savefig(os.path.join(params['dcase_dir'] , ref_filename.replace('.wav', '.jpg')), dpi=300, bbox_inches = "tight") 82 | 83 | 84 | --------------------------------------------------------------------------------