├── LICENSE.md ├── README.md ├── audio_mixer.py ├── audio_synthesizer.py ├── db_config.py ├── db_config_fsd.obj ├── db_config_nigens.obj ├── example_script_DCASE2021.py ├── example_script_DCASE2022.py ├── generation_parameters.py ├── make_dataset.py ├── metadata_synthesizer.py └── utils.py /LICENSE.md: -------------------------------------------------------------------------------- 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------ 2 | Copyright (c) 2021 Tampere University and its licensors 3 | All rights reserved. 4 | 5 | Permission is hereby granted, without written agreement and without 6 | license or royalty fees, to use and copy the code for the Sound Event 7 | Localization and Detection using Convolutional Recurrent Neural Network 8 | method/architecture, present in the GitHub repository with the handle 9 | seld-dcase2021, (“Work”) described in the paper with title "Sound event 10 | localization and detection of overlapping sources using 11 | convolutional recurrent neural network" and composed of files with 12 | code in the Python programming language. This grant is only for experimental and 13 | non-commercial purposes, provided that the copyright notice in its entirety 14 | appear in all copies of this Work, and the original source of this Work, 15 | Audio Research Group at Tampere University, is acknowledged in any publication 16 | that reports research using this Work. 17 | 18 | Any commercial use of the Work or any part thereof is strictly prohibited. 19 | Commercial use include, but is not limited to: 20 | - selling or reproducing the Work 21 | - selling or distributing the results or content achieved by use of the Work 22 | - providing services by using the Work. 23 | 24 | IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO 25 | ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES 26 | ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE 27 | UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY 28 | OF SUCH DAMAGE. 29 | 30 | TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS 31 | ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 32 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER 33 | IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION 34 | TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 35 | 36 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------ 37 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DCASE2022-data-generator 2 | Data generator for creating synthetic audio mixtures suitable for DCASE Challenge 2022 Task 3 3 | 4 | ### Prerequisites 5 | 6 | The provided code was tested with Python 3.8 and the following libraries: 7 | SoundFile 0.10.3, mat73 0.58, numpy 1.20.1, scipy 1.6.2, librosa 0.8.1. 8 | 9 | ## Getting Started 10 | 11 | This repository contains several Python file, which in total create a complete data generation framework. 12 | * The `generation_parameters.py` is a separate script used for setting the parameters for the data generation process, including things such as audio dataset, number of folds, mixuture length, etc. 13 | * The `db_config.py` is a class for containing audio filelists and data parameters from different audio datasets used for the mixture generation. 14 | * The `metadata_synthesizer.py` is a class for generating the mixture target labels, along with the corresponding metadata and statistics. Information from this class can be further used for synthesizing the final audios. 15 | * The `audio_synthesizer.py` is a class for synthesizing noiseless audio files containing the simulated mixtures. 16 | * The `audio_mixer.py` is a class for mixing the generated audio mixtures with background noise and/or interference mixtures. 17 | * The `make_dataset.py` is the main script in which the whole framework is used to perform the full data generation process. 18 | * The `utils.py` is an additional file containing complementary functions for other scripts. 19 | 20 | Moreover, two object files are included in case the database configuration via `db_config.py` takes too much time: 21 | * The `db_config_fsd.obj` is a DBConfig class containing information about the database and files for the FSD50K audioset. 22 | * The `db_config_nigens.obj` is a DBConfig class containing information about the database and files for the NIGENS audioset. 23 | 24 | Two exemplary scripts are added: 25 | * The `example_script_DCASE2021.py` is a script showing a pipeline to generate data similar to the DCASE2021 dataset. 26 | * The `example_script_DCASE2022.py` is a script showing a pipeline to generate data similar to the current DCASE2022 dataset. 27 | 28 | The repository is licensed under the [TAU License](LICENSE.md). 29 | -------------------------------------------------------------------------------- /audio_mixer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | import soundfile 4 | 5 | class AudioMixer(object): 6 | def __init__( 7 | self, params, db_config, mixtures, mixture_setup, audio_format, scenario_out, scenario_interf='target_interf' 8 | ): 9 | self._recpath2020 = params['noisepath'] 10 | self._rooms_paths2020 = ['01_bomb_center','02_gym','03_pb132_paatalo_classroom2','04_pc226_paatalo_office', 11 | '05_sa203_sahkotalo_lecturehall','06_sc203_sahkotalo_classroom2','07_se201_sahkotalo_classroom', 12 | '08_se203_sahkotalo_classroom','09_tb103_tietotalo_lecturehall', 13 | '10_tc352_tietotalo_meetingroom'] 14 | self._nb_rooms2020 = len(self._rooms_paths2020) 15 | self._recpath2019 = params['noisepath'] 16 | self._rooms_paths2019 = ['11_language_center','12_tietotalo','13_reaktori','14_sahkotalo','15_festia'] 17 | self._nb_rooms2019 = len(self._rooms_paths2019) 18 | self._mixturepath = params['mixturepath'] 19 | self._mixtures = mixtures 20 | self._targetpath = self._mixturepath + '/' + mixture_setup['scenario'] 21 | if scenario_out == 'target_noisy': 22 | self._scenarios = [1, 0, 1] 23 | elif scenario_out == 'target_interf_noiseless': 24 | self._scenarios = [1, 1, 0] 25 | elif scenario_out == 'target_interf_noisy': 26 | self._scenarios = [1, 1, 1] 27 | else: 28 | raise ValueError('Incorrect scenario specified') 29 | 30 | self._scenariopath = self._mixturepath + '/' + scenario_out 31 | self._audio_format = audio_format 32 | if self._audio_format == 'mic': 33 | self._mic_format = 'tetra' 34 | elif self._audio_format == 'foa': 35 | self._mic_format = 'foa_sn3d' 36 | 37 | if self._scenarios[1]: 38 | self._interfpath = self._mixturepath + '/' + scenario_interf 39 | self._fs_mix = mixture_setup['fs_mix'] 40 | self._tMix = mixture_setup['mixture_duration'] 41 | self._lMix = int(np.round(self._fs_mix*self._tMix)) 42 | # target signal-to-interference power ratio 43 | # set at 3 now of all targets, so that the total interference power is 44 | #approximately 0dB wrt. to a single layer of targets (for 3 target layers) 45 | self._sir = 3. 46 | self._nb_folds = mixture_setup['nb_folds'] 47 | self._rnd_generator = np.random.default_rng(2024) 48 | 49 | def mix_audio(self): 50 | if not os.path.isdir(self._scenariopath + '/' + self._audio_format): 51 | os.makedirs(self._scenariopath + '/' + self._audio_format) 52 | # start creating the mixtures description structure 53 | for nfold in range(self._nb_folds): 54 | print('Adding noise for fold {}'.format(nfold+1)) 55 | rooms = self._mixtures[nfold][0]['roomidx'] 56 | nb_rooms = len(rooms) 57 | for nr in range(nb_rooms): 58 | nroom = rooms[nr] 59 | 60 | if self._scenarios[2]: 61 | print('Loading ambience') 62 | recpath = self._recpath2020 if nroom <=10 else self._recpath2019 63 | roompath = self._rooms_paths2020 if nroom <= 10 else self._rooms_paths2019 64 | roomidx = nroom if nroom <= 10 else nroom-10 65 | ambience, _ = soundfile.read(recpath + '/' + roompath[roomidx-1] + '/ambience_' + self._mic_format + '_24k_edited.wav') 66 | lSig = np.shape(ambience)[0] 67 | nSegs = np.floor(lSig/self._lMix) 68 | 69 | nb_mixtures = len(self._mixtures[nfold][nr]['mixture']) 70 | for nmix in range(nb_mixtures): 71 | print('Loading target mixture {}/{} \n'.format(nmix+1,nb_mixtures)) 72 | mixture_filename = 'fold{}_room{}_mix{:03}.wav'.format(nfold+1,nr+1,nmix+1) 73 | snr = self._mixtures[nfold][nr]['mixture'][nmix]['snr'] 74 | 75 | target_sig, _ = soundfile.read(self._targetpath + '/' + self._audio_format + '/' + mixture_filename) 76 | target_omni_energy = np.sum(np.mean(target_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(target_sig[:,0]**2) 77 | 78 | if self._scenarios[1]: 79 | print('Loading interferer mixture {}/{} \n'.format(nmix+1, nb_mixtures)) 80 | interf_sig, _ = soundfile.read(self._interfpath + '/' + self._audio_format + '/' + mixture_filename) 81 | inter_omni_energy = np.sum(np.mean(interf_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(interf_sig[:,0]**2) 82 | interf_norm = np.sqrt(inter_omni_energy/(self._sir * inter_omni_energy)) 83 | ## ADD INTERFERENCE 84 | target_sig += interf_norm * interf_sig 85 | 86 | if self._scenarios[2]: 87 | # check if the number of mixture is lower than the duration of the noise recordings 88 | idx_range = np.arange(0,self._lMix,dtype=int) # computed here for convenience 89 | if nmix < nSegs: 90 | ambient_sig = ambience[nmix*self._lMix+idx_range, :] 91 | else: 92 | # else just mix randomly two segments 93 | rand_idx = self._rnd_generator.integers(0,nSegs,2) 94 | ambient_sig = ambience[rand_idx[0]*self._lMix+ idx_range, :]/np.sqrt(2) 95 | + ambience[rand_idx[1]*self._lMix + idx_range, :]/np.sqrt(2) 96 | 97 | ambi_energy = np.sum(np.mean(ambient_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(ambient_sig[:,0]**2) 98 | ambi_norm = np.sqrt(target_omni_energy * 10.**(-snr/10.) / ambi_energy) 99 | 100 | ## ADD NOISE 101 | target_sig += ambi_norm * ambient_sig 102 | 103 | 104 | soundfile.write(self._scenariopath+'/'+self._audio_format+'/'+mixture_filename, target_sig, self._fs_mix) 105 | 106 | 107 | -------------------------------------------------------------------------------- /audio_synthesizer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.io 3 | import utils 4 | import os 5 | import mat73 6 | import scipy.signal as signal 7 | import soundfile 8 | 9 | class AudioSynthesizer(object): 10 | def __init__( 11 | self, params, mixtures, mixture_setup, db_config, audio_format 12 | ): 13 | self._mixtures = mixtures 14 | self._rirpath = params['rirpath'] 15 | self._db_path = params['db_path'] 16 | self._audio_format = audio_format 17 | self._outpath = params['mixturepath'] + '/' + mixture_setup['scenario'] + '/' + self._audio_format 18 | self._rirdata = db_config._rirdata 19 | self._nb_rooms = len(self._rirdata) 20 | self._room_names = [] 21 | for nr in range(self._nb_rooms): 22 | self._room_names.append(self._rirdata[nr][0][0][0]) 23 | self._classnames = mixture_setup['classnames'] 24 | self._fs_mix = mixture_setup['fs_mix'] 25 | self._t_mix = mixture_setup['mixture_duration'] 26 | self._l_mix = int(np.round(self._fs_mix * self._t_mix)) 27 | self._time_idx100 = np.arange(0., self._t_mix, 0.1) 28 | self._stft_winsize_moving = 0.1*self._fs_mix//2 29 | self._nb_folds = len(mixtures) 30 | self._apply_event_gains = db_config._apply_class_gains 31 | if self._apply_event_gains: 32 | self._class_gains = db_config._class_gains 33 | 34 | 35 | def synthesize_mixtures(self): 36 | rirdata2room_idx = {1: 0, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 8: 6, 9: 7, 10: 8} # room numbers in the rirdata array 37 | # create path if doesn't exist 38 | if not os.path.isdir(self._outpath): 39 | os.makedirs(self._outpath) 40 | 41 | for nfold in range(self._nb_folds): 42 | print('Generating scene audio for fold {}'.format(nfold+1)) 43 | 44 | rooms = self._mixtures[nfold][0]['roomidx'] 45 | nb_rooms_in_fold = len(rooms) 46 | for nr in range(nb_rooms_in_fold): 47 | 48 | nroom = rooms[nr] 49 | nb_mixtures = len(self._mixtures[nfold][nr]['mixture']) 50 | print('Loading RIRs for room {}'.format(nroom)) 51 | 52 | room_idx = rirdata2room_idx[nroom] 53 | if nroom > 9: 54 | struct_name = 'rirs_{}_{}'.format(nroom,self._room_names[room_idx]) 55 | else: 56 | struct_name = 'rirs_0{}_{}'.format(nroom,self._room_names[room_idx]) 57 | path = self._rirpath + '/' + struct_name + '.mat' 58 | rirs = mat73.loadmat(path) 59 | rirs = rirs['rirs'][self._audio_format] 60 | # stack all the RIRs for all heights to make one large trajectory 61 | print('Stacking same trajectory RIRs') 62 | lRir = len(rirs[0][0]) 63 | nCh = len(rirs[0][0][0]) 64 | 65 | n_traj = np.shape(self._rirdata[room_idx][0][2])[0] 66 | n_rirs_max = np.max(np.sum(self._rirdata[room_idx][0][3],axis=1)) 67 | 68 | channel_rirs = np.zeros((lRir, nCh, n_rirs_max, n_traj)) 69 | for ntraj in range(n_traj): 70 | nHeights = np.sum(self._rirdata[room_idx][0][3][ntraj,:]>0) 71 | 72 | nRirs_accum = 0 73 | 74 | # flip the direction of each second height, so that a 75 | # movement can jump from the lower to the higher smoothly and 76 | # continue moving the opposite direction 77 | flip = False 78 | for nheight in range(nHeights): 79 | nRirs_nh = self._rirdata[room_idx][0][3][ntraj,nheight] 80 | rir_l = len(rirs[ntraj][nheight][0,0,:]) 81 | if flip: 82 | channel_rirs[:, :, nRirs_accum + np.arange(0,nRirs_nh),ntraj] = rirs[ntraj][nheight][:,:,np.arange(rir_l-1,-1,-1)] 83 | else: 84 | channel_rirs[:, :, nRirs_accum + np.arange(0,nRirs_nh),ntraj] = rirs[ntraj][nheight] 85 | 86 | nRirs_accum += nRirs_nh 87 | flip = not flip 88 | 89 | del rirs #clear some memory 90 | 91 | for nmix in range(nb_mixtures): 92 | print('Writing mixture {}/{}'.format(nmix+1,nb_mixtures)) 93 | 94 | ### WRITE TARGETS EVENTS 95 | mixture_nm = self._mixtures[nfold][nr]['mixture'][nmix] 96 | try: 97 | nb_events = len(mixture_nm['class']) 98 | except TypeError: 99 | nb_events = 1 100 | 101 | mixsig = np.zeros((self._l_mix, 4)) 102 | for nev in range(nb_events): 103 | if not nb_events == 1: 104 | classidx = int(mixture_nm['class'][nev]) 105 | onoffset = mixture_nm['event_onoffsets'][nev,:] 106 | filename = mixture_nm['files'][nev] 107 | ntraj = int(mixture_nm['trajectory'][nev]) 108 | 109 | else: 110 | classidx = int(mixture_nm['class']) 111 | onoffset = mixture_nm['event_onoffsets'] 112 | filename = mixture_nm['files'] 113 | ntraj = int(mixture_nm['trajectory']) 114 | 115 | # load event audio and resample to match RIR sampling 116 | eventsig, fs_db = soundfile.read(self._db_path + '/' + filename) 117 | if len(np.shape(eventsig)) > 1: 118 | eventsig = eventsig[:,0] 119 | eventsig = signal.resample_poly(eventsig, self._fs_mix, fs_db) 120 | 121 | #spatialize audio 122 | riridx = mixture_nm['rirs'][nev] if nb_events > 1 else mixture_nm['rirs'] 123 | 124 | 125 | moving_condition = mixture_nm['isMoving'][nev] if nb_events > 1 else mixture_nm['isMoving'] 126 | if nb_events > 1 and not moving_condition: 127 | riridx = int(riridx[0]) if len(riridx)==1 else riridx.astype('int') 128 | if nb_events == 1 and type(riridx) != int: 129 | riridx = riridx[0] 130 | 131 | if moving_condition: 132 | nRirs_moving = len(riridx) if np.shape(riridx) else 1 133 | ir_times = self._time_idx100[np.arange(0,nRirs_moving)] 134 | mixeventsig = 481.6989*utils.ctf_ltv_direct(eventsig, channel_rirs[:, :, riridx, ntraj], ir_times, self._fs_mix, self._stft_winsize_moving) / float(len(eventsig)) 135 | else: 136 | mixeventsig0 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 0, riridx, ntraj]), mode='full', method='fft') 137 | mixeventsig1 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 1, riridx, ntraj]), mode='full', method='fft') 138 | mixeventsig2 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 2, riridx, ntraj]), mode='full', method='fft') 139 | mixeventsig3 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 3, riridx, ntraj]), mode='full', method='fft') 140 | 141 | mixeventsig = np.stack((mixeventsig0,mixeventsig1,mixeventsig2,mixeventsig3),axis=1) 142 | if self._apply_event_gains: 143 | # apply random gain to each event based on class gain, distribution given externally 144 | K=1000 145 | rand_energies_per_spec = utils.sample_from_quartiles(K, self._class_gains[classidx]) 146 | intr_quart_energies_per_sec = rand_energies_per_spec[K + np.arange(3*(K+1))] 147 | rand_energy_per_spec = intr_quart_energies_per_sec[np.random.randint(len(intr_quart_energies_per_sec))] 148 | sample_onoffsets = mixture_nm['sample_onoffsets'][nev] 149 | sample_active_time = sample_onoffsets[1] - sample_onoffsets[0] 150 | target_energy = rand_energy_per_spec*sample_active_time 151 | if self._audio_format == 'mic': 152 | event_omni_energy = np.sum(np.sum(mixeventsig,axis=1)**2) 153 | elif self._audio_format == 'foa': 154 | event_omni_energy = np.sum(mixeventsig[:,0]**2) 155 | 156 | norm_gain = np.sqrt(target_energy / event_omni_energy) 157 | mixeventsig = norm_gain * mixeventsig 158 | 159 | lMixeventsig = np.shape(mixeventsig)[0] 160 | if np.round(onoffset[0]*self._fs_mix) + lMixeventsig <= self._t_mix * self._fs_mix: 161 | mixsig[int(np.round(onoffset[0]*self._fs_mix)) + np.arange(0,lMixeventsig,dtype=int), :] += mixeventsig 162 | else: 163 | lMixeventsig_trunc = int(self._t_mix * self._fs_mix - int(np.round(onoffset[0]*self._fs_mix))) 164 | mixsig[int(np.round(onoffset[0]*self._fs_mix)) + np.arange(0,lMixeventsig_trunc,dtype=int), :] += mixeventsig[np.arange(0,lMixeventsig_trunc,dtype=int), :] 165 | 166 | # normalize 167 | gnorm = 0.5/np.max(np.max(np.abs(mixsig))) 168 | 169 | mixsig = gnorm*mixsig 170 | mixture_filename = 'fold{}_room{}_mix{:03}.wav'.format(nfold+1, nr+1, nmix+1) 171 | soundfile.write(self._outpath + '/' + mixture_filename, mixsig, self._fs_mix) 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | -------------------------------------------------------------------------------- /db_config.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.io 3 | import csv 4 | import librosa 5 | import os 6 | 7 | class DBConfig(object): 8 | def __init__( 9 | self, params 10 | ): 11 | self._rirpath = params['rirpath'] 12 | self._mixturepath = params['mixturepath'] 13 | self._rirdata = self._load_rirdata() 14 | self._nb_folds = params['nb_folds'] 15 | self._rooms2fold = params['rooms2fold'] 16 | self._db_path = params['db_path'] 17 | self._db_name = params['db_name'] 18 | if self._db_name == 'fsd50k': 19 | self._fs = 44100 20 | self._classes = ['femaleSpeech', 'maleSpeech', 'clapping', 'telephone', 'laughter', 'domesticSounds', 'footsteps', 21 | 'doorCupboard', 'music', 'musicInstrument', 'waterTap', 'bell', 'knock'] 22 | self._nb_classes = len(self._classes) 23 | self._class_mobility = [2, 2, 2, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0] 24 | self._apply_class_gains = True 25 | # self._class_gains = [[0, 0.2004, 0.8008, 6.8766, 357.8846], # femaleSpeech 26 | # [0.0060, 0.4901, 2.5097, 14.3011, 372.2183], # maleSpeech 27 | # [0.3607, 1.1029, 2.6719, 3.9629, 26.6442], #clapping 28 | # [0.0072, 0.8222, 2.3849, 34.1233, 168.5152], #telephone 29 | # [0.0273, 0.8911, 1.9856, 5.6164, 79.1070], #laughter 30 | # [0.0268, 0.1009, 1.8363, 13.9294, 83.2484], #domesticSounds 31 | # [0.0099, 0.3764, 1.2759, 5.4426, 318.8329], #footsteps 32 | # [0.0697, 0.4919, 2.7159, 28.0537, 313.8807], #doorCupboard 33 | # [0.0219, 0.3189, 0.7787, 2.3823, 355.9656], #music 34 | # [0.0160, 0.9563, 2.3413, 5.6720, 168.6679], #musicInstrument 35 | # [0.0972, 0.1828, 0.6304, 0.9522, 125.1975], #waterTap 36 | # [0.0160, 0.9563, 2.3413, 5.6720, 168.6679], #bell 37 | # [0.0697, 0.4919, 2.7159, 28.0537, 313.8807]] #knock 38 | self._class_gains = [[0.0791, 0.5330, 1.3132, 2.2365, 541.3376], # femaleSpeech 39 | [0.0116, 0.6913, 1.2199, 3.0048, 235.0189], # maleSpeech 40 | [0.5083, 2.2579, 3.0934, 7.6387, 100.1174], #clapping 41 | [0.0126, 0.3373, 0.7526, 2.1165, 18.5226], #telephone 42 | [0.1909, 1.4950, 3.2206, 8.2153, 221.2892], #laughter 43 | [0.0004, 1.8347, 3.4778, 5.9276, 555.4895], #domesticSounds 44 | [0.0099, 0.3969, 0.8870, 2.0800, 15.7529], #footsteps 45 | [0.0146, 0.9141, 7.8186, 109.0767, 3979.700], #doorCupboard 46 | [0.1153, 0.4313, 1.2903, 3.3541, 52.6977], #music 47 | [0.0596, 1.4146, 5.3529, 20.6286, 362.0704], #musicInstrument 48 | [0.0117, 0.5505, 1.4926, 2.1936, 44.9466], #waterTap 49 | [0.0596, 1.4146, 5.3529, 20.6286, 362.0704], #bell 50 | [2.4502, 2.4502, 41.3609, 80.2716, 80.2716]] #knock 51 | self._samplelist = self._load_db_fileinfo_fsd() 52 | 53 | elif self._db_name == 'nigens': 54 | self._fs = 44100 55 | self._class_dict = {'alarm': 0,'baby': 1, 'crash': 2, 'dog': 3, 'engine': 4, 'femaleScream': 5, 'femaleSpeech': 6, 56 | 'fire': 7, 'footsteps': 8, 'knock': 9, 'maleScream': 10, 'maleSpeech': 11, 57 | 'phone': 12, 'piano': 13, 'general': 14} 58 | self._class_mobility = [0, 2, 0, 2, 2, 0, 2, 0, 1, 0, 0, 2, 2, 0, 0] 59 | self._classes = list(self._class_dict.keys()) 60 | self._nb_classes = len(self._classes) 61 | self._samplelist = self._load_db_fileinfo_nigens() 62 | self._apply_class_gains = False 63 | self._class_gaines = [] 64 | 65 | 66 | def _load_rirdata(self): 67 | matdata = scipy.io.loadmat(self._rirpath + '/rirdata.mat') 68 | rirdata = matdata['rirdata']['room'][0][0] 69 | return rirdata 70 | 71 | def _load_db_fileinfo_fsd(self): 72 | samplelist_per_fold = [] 73 | folds = self._make_selected_filelist() 74 | 75 | for nfold in range(self._nb_folds): 76 | print('Preparing sample list for fold {}'.format(str(nfold+1))) 77 | counter = 1 78 | samplelist = {'class': np.array([]), 'audiofile': np.array([]), 'duration': np.array([]), 'onoffset': [], 'nSamples': [], 79 | 'nSamplesPerClass': np.array([]), 'meanStdDurationPerClass': np.array([]), 'minMaxDurationPerClass': np.array([])} 80 | for ncl in range(self._nb_classes): 81 | nb_samples_per_class = len(folds[ncl][nfold]) 82 | 83 | for ns in range(nb_samples_per_class): 84 | samplelist['class'] = np.append(samplelist['class'], ncl) 85 | samplelist['audiofile'] = np.append(samplelist['audiofile'], folds[ncl][nfold][ns]) 86 | audiopath = self._db_path + '/' + folds[ncl][nfold][ns] 87 | audio, sr = librosa.load(audiopath) 88 | duration = len(audio)/float(sr) 89 | samplelist['duration'] = np.append(samplelist['duration'], duration) 90 | samplelist['onoffset'].append(np.array([[0., duration],])) 91 | samplelist['nSamples'].append(counter) 92 | counter += 1 93 | samplelist['onoffset'] = np.squeeze(np.array(samplelist['onoffset'],dtype=object)) 94 | for n_class in range(self._nb_classes): 95 | class_idx = (samplelist['class'] == n_class) 96 | samplelist['nSamplesPerClass'] = np.append(samplelist['nSamplesPerClass'], np.sum(class_idx)) 97 | if n_class == 0: 98 | samplelist['meanStdDurationPerClass'] = np.array([[np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]]) 99 | samplelist['minMaxDurationPerClass'] = np.array([[np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]]) 100 | else: 101 | samplelist['meanStdDurationPerClass'] = np.vstack((samplelist['meanStdDurationPerClass'], np.array([np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]))) 102 | samplelist['minMaxDurationPerClass'] = np.vstack((samplelist['minMaxDurationPerClass'], np.array([np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]))) 103 | samplelist_per_fold.append(samplelist) 104 | 105 | 106 | return samplelist_per_fold 107 | 108 | 109 | def _load_db_fileinfo_nigens(self): 110 | samplelist_per_fold = [] 111 | 112 | for nfold in range(self._nb_folds): 113 | print('Preparing sample list for fold {}'.format(str(nfold+1))) 114 | foldlist_file = self._db_path + '/NIGENS_8-foldSplit_fold' + str(nfold+1) + '_wo_timit.flist' 115 | filelist = [] 116 | with open(foldlist_file, newline = '') as flist: 117 | flist_reader = csv.reader(flist, delimiter='\t') 118 | for fline in flist_reader: 119 | filelist.append(fline) 120 | flist_len = len(filelist) 121 | 122 | samplelist = {'class': np.array([]), 'audiofile': np.array([]), 'duration': np.array([]), 'onoffset': [], 'nSamples': flist_len, 123 | 'nSamplesPerClass': np.array([]), 'meanStdDurationPerClass': np.array([]), 'minMaxDurationPerClass': np.array([])} 124 | for file in range(flist_len): 125 | clsfilename = filelist[file][0].split('/') 126 | clsname = clsfilename[0] 127 | filename = clsfilename[1] 128 | 129 | samplelist['class'] = np.append(samplelist['class'], int(self._class_dict[clsname])) 130 | samplelist['audiofile'] = np.append(samplelist['audiofile'], clsname + '/' + filename) 131 | audiopath = self._db_path + '/' + clsname + '/' + filename 132 | #print(audiopath) 133 | #with contextlib.closing(wave.open(audiopath,'r')) as f: 134 | audio, sr = librosa.load(audiopath) 135 | samplelist['duration'] = np.append(samplelist['duration'], len(audio)/float(sr)) 136 | 137 | if clsname == 'general': 138 | onoffsets = [] 139 | onoffsets.append([0., samplelist['duration'][file]]) 140 | samplelist['onoffset'].append(np.array(onoffsets)) 141 | else: 142 | meta_file = self._db_path + '/' + clsname + '/' + filename + '.txt' 143 | onoffsets = [] 144 | with open(meta_file, newline = '') as meta: 145 | meta_reader = csv.reader(meta, delimiter='\t') 146 | for onoff in meta_reader: 147 | onoffsets.append([float(onoff[0]), float(onoff[1])]) 148 | 149 | samplelist['onoffset'].append(np.array(onoffsets)) 150 | samplelist['onoffset'] = np.squeeze(np.array(samplelist['onoffset'],dtype=object)) 151 | 152 | for n_class in range(self._nb_classes): 153 | class_idx = (samplelist['class'] == n_class) 154 | samplelist['nSamplesPerClass'] = np.append(samplelist['nSamplesPerClass'], np.sum(class_idx)) 155 | if n_class == 0: 156 | samplelist['meanStdDurationPerClass'] = np.array([[np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]]) 157 | samplelist['minMaxDurationPerClass'] = np.array([[np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]]) 158 | else: 159 | samplelist['meanStdDurationPerClass'] = np.vstack((samplelist['meanStdDurationPerClass'], np.array([np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]))) 160 | samplelist['minMaxDurationPerClass'] = np.vstack((samplelist['minMaxDurationPerClass'], np.array([np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]))) 161 | samplelist_per_fold.append(samplelist) 162 | 163 | return samplelist_per_fold 164 | 165 | def _make_selected_filelist(self): 166 | folds = [] 167 | folds_names = ['train', 'test'] #TODO: make it more generic 168 | nb_folds = len(folds_names) 169 | class_list = self._classes #list(self._classes.keys()) 170 | 171 | for ntc in range(self._nb_classes): 172 | classpath = self._db_path + '/' + class_list[ntc] 173 | 174 | per_fold = [] 175 | for nf in range(nb_folds): 176 | foldpath = classpath + '/' + folds_names[nf] 177 | foldcont = os.listdir(foldpath) 178 | nb_subdirs = len(foldcont) 179 | filelist = [] 180 | for ns in range(nb_subdirs): 181 | subfoldcont = os.listdir(foldpath + '/' + foldcont[ns]) 182 | for nfl in range(len(subfoldcont)): 183 | if subfoldcont[nfl][0] != '.' and subfoldcont[nfl].endswith('.wav'): 184 | filelist.append(class_list[ntc] + '/' + folds_names[nf] + '/' + foldcont[ns] + '/' + subfoldcont[nfl]) 185 | per_fold.append(filelist) 186 | folds.append(per_fold) 187 | 188 | return folds 189 | -------------------------------------------------------------------------------- /db_config_fsd.obj: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/danielkrause/DCASE2022-data-generator/01745f736be8c537076c42fa1642ecb5c3454714/db_config_fsd.obj -------------------------------------------------------------------------------- /db_config_nigens.obj: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/danielkrause/DCASE2022-data-generator/01745f736be8c537076c42fa1642ecb5c3454714/db_config_nigens.obj -------------------------------------------------------------------------------- /example_script_DCASE2021.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | from db_config import DBConfig 4 | from metadata_synthesizer import MetadataSynthesizer 5 | from audio_synthesizer import AudioSynthesizer 6 | from audio_mixer import AudioMixer 7 | import pickle 8 | from generation_parameters import get_params 9 | 10 | ############## 11 | ############## THIS IS AN EXEMPLARY SCRIPT GENERATING DATA 12 | ############## SIMILAR TO THE DCASE2021 dataset 13 | ############## 14 | 15 | 16 | # use parameter set defined by user 17 | task_id = '1' ### '1' - NIGENS, '2' - FSD50k 18 | 19 | params = get_params(task_id) 20 | 21 | ### Create database config based on params (e.g. filelist name etc.) 22 | #db_config = DBConfig(params) 23 | 24 | # LOAD DB-config which is already done 25 | db_handler = open('db_config_nigens.obj','rb') 26 | db_config = pickle.load(db_handler) 27 | db_handler.close() 28 | 29 | #create mixture synthesizer class 30 | noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless') 31 | 32 | #create mixture targets 33 | mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures() 34 | 35 | #calculate statistics and create metadata structure 36 | metadata, stats = noiselessSynth.prepare_metadata_and_stats() 37 | 38 | #write metadata to text files 39 | noiselessSynth.write_metadata() 40 | 41 | #create directional interference mixtures 42 | task_id_int = '3' 43 | params_interference = get_params(task_id_int) 44 | noiselessSynth_interference = MetadataSynthesizer(db_config, params_interference, 'target_interf') 45 | interference_target, interference_setup_target, foldlist_target_int = noiselessSynth.create_mixtures() 46 | 47 | if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC) 48 | #create audio synthesis class and synthesize audio files for given mixtures 49 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format']) 50 | noiselessAudioSynth.synthesize_mixtures() 51 | 52 | 53 | #synthesize audio containing interference mixtures 54 | noiselessAudioSynth_interference = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, params['audio_format']) 55 | noiselessAudioSynth_interference.synthesize_mixtures() 56 | 57 | #mix the created audio mixtures with background noise and interference mixtures 58 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_interf_noisy') 59 | audioMixer.mix_audio() 60 | else: 61 | #create audio synthesis class and synthesize audio files for given mixtures 62 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa') 63 | noiselessAudioSynth.synthesize_mixtures() 64 | noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic') 65 | noiselessAudioSynth2.synthesize_mixtures() 66 | 67 | #synthesize audio containing interference mixtures 68 | noiselessAudioSynth_interference = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, 'foa') 69 | noiselessAudioSynth_interference.synthesize_mixtures() 70 | noiselessAudioSynth_interference2 = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, 'mic') 71 | noiselessAudioSynth_interference2.synthesize_mixtures() 72 | 73 | #mix the created audio mixtures with background noise and interference mixtures 74 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_interf_noisy') 75 | audioMixer.mix_audio() 76 | audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_interf_noisy') 77 | audioMixer2.mix_audio() -------------------------------------------------------------------------------- /example_script_DCASE2022.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | from db_config import DBConfig 4 | from metadata_synthesizer import MetadataSynthesizer 5 | from audio_synthesizer import AudioSynthesizer 6 | from audio_mixer import AudioMixer 7 | import pickle 8 | from generation_parameters import get_params 9 | 10 | 11 | ############## 12 | ############## THIS IS AN EXEMPLARY SCRIPT GENERATING DATA 13 | ############## SIMILAR TO THE DCASE2022 dataset 14 | ############## 15 | 16 | 17 | # use parameter set defined by user 18 | task_id = '2' 19 | 20 | params = get_params(task_id) 21 | 22 | ### Create database config based on params (e.g. filelist name etc.) 23 | #db_config = DBConfig(params) 24 | 25 | # LOAD DB-config which is already done 26 | db_handler = open('db_config_fsd.obj','rb') 27 | db_config = pickle.load(db_handler) 28 | db_handler.close() 29 | 30 | #create mixture synthesizer class 31 | noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless') 32 | 33 | #create mixture targets 34 | mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures() 35 | 36 | #calculate statistics and create metadata structure 37 | metadata, stats = noiselessSynth.prepare_metadata_and_stats() 38 | 39 | #write metadata to text files 40 | noiselessSynth.write_metadata() 41 | 42 | 43 | if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC) 44 | #create audio synthesis class and synthesize audio files for given mixtures 45 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format']) 46 | noiselessAudioSynth.synthesize_mixtures() 47 | 48 | #mix the created audio mixtures with background noise 49 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_noisy') 50 | audioMixer.mix_audio() 51 | else: 52 | #create audio synthesis class and synthesize audio files for given mixtures 53 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa') 54 | noiselessAudioSynth.synthesize_mixtures() 55 | noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic') 56 | noiselessAudioSynth2.synthesize_mixtures() 57 | 58 | #mix the created audio mixtures with background noise 59 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_noisy') 60 | audioMixer.mix_audio() 61 | audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_noisy') 62 | audioMixer2.mix_audio() -------------------------------------------------------------------------------- /generation_parameters.py: -------------------------------------------------------------------------------- 1 | # Parameters used in the data generation process. 2 | 3 | 4 | def get_params(argv='1'): 5 | print("SET: {}".format(argv)) 6 | # ########### default parameters (NIGENS data) ############## 7 | params = dict( 8 | db_name = 'nigens', # name of the audio dataset used for data generation 9 | rirpath = '/scratch/asignal/krauseda/DCASE_data_generator/RIR_DB', # path containing Room Impulse Responses (RIRs) 10 | mixturepath = 'E:/DCASE2022/TAU_Spatial_RIR_Database_2021/Dataset-NIGENS', # output path for the generated dataset 11 | noisepath = '/scratch/asignal/krauseda/DCASE_data_generator/Noise_DB', # path containing background noise recordings 12 | nb_folds = 2, # number of folds (default 2 - training and testing) 13 | rooms2fold = [[10, 6, 1, 4, 3, 8], # FOLD 1, rooms assigned to each fold (0's are ignored) 14 | [9, 5, 2, 0, 0, 0]], # FOLD 2 15 | db_path = 'E:/DCASE2022/TAU_Spatial_RIR_Database_2021/Code/NIGENS', # path containing audio events to be utilized during data generation 16 | max_polyphony = 3, # maximum number of overlapping sound events 17 | active_classes = [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13], # list of sound classes to be used for data generation 18 | nb_mixtures_per_fold = [900, 300], # if scalar, same number of mixtures for each fold 19 | mixture_duration = 60., #in seconds 20 | event_time_per_layer = 40., #in seconds (should be less than mixture_duration) 21 | audio_format = 'both', # 'foa' (First Order Ambisonics) or 'mic' (four microphones) or 'both' 22 | ) 23 | 24 | 25 | # ########### User defined parameters ############## 26 | if argv == '1': 27 | print("USING DEFAULT PARAMETERS FOR NIGENS DATA\n") 28 | 29 | elif argv == '2': ###### FSD50k DATA 30 | params['db_name'] = 'fsd50k' 31 | params['db_path']= '/scratch/asignal/krauseda/DCASE_data_generator/Code/FSD50k' 32 | params['mixturepath'] = '/scratch/asignal/krauseda/Data-FSD' 33 | params['active_classes'] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] 34 | params['max_polyphony'] = 2 35 | 36 | elif argv == '3': ###### NIGENS interference data 37 | params['active_classes'] = [4, 7, 14] 38 | params['max_polyphony'] = 1 39 | 40 | else: 41 | print('ERROR: unknown argument {}'.format(argv)) 42 | exit() 43 | 44 | for key, value in params.items(): 45 | print("\t{}: {}".format(key, value)) 46 | return params -------------------------------------------------------------------------------- /make_dataset.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | from db_config import DBConfig 4 | from metadata_synthesizer import MetadataSynthesizer 5 | from audio_synthesizer import AudioSynthesizer 6 | from audio_mixer import AudioMixer 7 | import pickle 8 | from generation_parameters import get_params 9 | 10 | 11 | 12 | 13 | def main(argv): 14 | """ 15 | Main wrapper for the whole data generation framework. 16 | 17 | :param argv: expects two optional inputs. 18 | first input: task_id - (optional) To chose the generation parameters from generation_parameters.py. 19 | (default) 1 - uses default parameters (FSD50K data) 20 | """ 21 | print(argv) 22 | if len(argv) != 2: 23 | print('\n\n') 24 | print('The code expected an optional input') 25 | print('\t>> python make_dataset.py ') 26 | print('\t\t is used to choose the user-defined parameter set from generation_parameters.py') 27 | print('Using default inputs for now') 28 | print('\n\n') 29 | 30 | # use parameter set defined by user 31 | task_id = '2' if len(argv) < 2 else argv[1] 32 | 33 | params = get_params(task_id) 34 | 35 | ### Create database config based on params (e.g. filelist name etc.) 36 | db_config = DBConfig(params) 37 | 38 | # LOAD DB-config which is already done 39 | # db_handler = open('db_config_fsd.obj','rb') 40 | # db_config = pickle.load(db_handler) 41 | # db_handler.close() 42 | 43 | #create mixture synthesizer class 44 | noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless') 45 | 46 | #create mixture targets 47 | mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures() 48 | 49 | #calculate statistics and create metadata structure 50 | metadata, stats = noiselessSynth.prepare_metadata_and_stats() 51 | 52 | #write metadata to text files 53 | noiselessSynth.write_metadata() 54 | 55 | #create audio synthesis class and synthesize audio files for given mixtures 56 | if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC) 57 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format']) 58 | noiselessAudioSynth.synthesize_mixtures() 59 | 60 | #mix the created audio mixtures with background noise 61 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_noisy') 62 | audioMixer.mix_audio() 63 | else: 64 | noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa') 65 | noiselessAudioSynth.synthesize_mixtures() 66 | noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic') 67 | noiselessAudioSynth2.synthesize_mixtures() 68 | 69 | #mix the created audio mixtures with background noise 70 | audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_noisy') 71 | audioMixer.mix_audio() 72 | audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_noisy') 73 | audioMixer2.mix_audio() 74 | 75 | 76 | 77 | if __name__ == "__main__": 78 | try: 79 | sys.exit(main(sys.argv)) 80 | except (ValueError, IOError) as e: 81 | sys.exit(e) 82 | 83 | -------------------------------------------------------------------------------- /metadata_synthesizer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from utils import cart2sph 3 | import os 4 | import csv 5 | 6 | class MetadataSynthesizer(object): 7 | def __init__( 8 | self, db_config, params, scenario_name 9 | ): 10 | self._db_config = db_config 11 | self._db = db_config._db_name 12 | self._metadata_path = params['mixturepath'] + '/' + 'metadata' 13 | self._classnames = db_config._classes 14 | self._active_classes = np.sort(params['active_classes']) 15 | self._nb_active_classes = len(self._active_classes) 16 | self._class2activeClassmap = [] 17 | for cl in range(len(self._db_config._classes)): 18 | if cl in self._active_classes: 19 | self._class2activeClassmap.append(cl) 20 | else: 21 | self._class2activeClassmap.append(0) 22 | 23 | self._class_mobility = db_config._class_mobility 24 | self._mixture_setup = {} 25 | self._mixture_setup['scenario'] = scenario_name 26 | self._mixture_setup['nb_folds'] = db_config._nb_folds 27 | self._mixture_setup['rooms2folds'] = db_config._rooms2fold 28 | self._mixture_setup['classnames'] = [] 29 | for cl in self._classnames: 30 | self._mixture_setup['classnames'].append(cl) 31 | self._mixture_setup['nb_classes'] = len(self._active_classes) 32 | self._mixture_setup['fs_mix'] = 24000 #fs of RIRs 33 | self._mixture_setup['mixture_duration'] = params['mixture_duration'] 34 | self._nb_mixtures_per_fold = params['nb_mixtures_per_fold'] 35 | self._nb_mixtures = self._mixture_setup['nb_folds'] * self._nb_mixtures_per_fold if np.isscalar(self._nb_mixtures_per_fold) else np.sum(self._nb_mixtures_per_fold) 36 | self._mixture_setup['total_duration'] = self._nb_mixtures * self._mixture_setup['mixture_duration'] 37 | self._mixture_setup['speed_set'] = [10., 20., 40.] 38 | self._mixture_setup['snr_set'] = np.arange(6.,31.) 39 | self._mixture_setup['time_idx_100ms'] = np.arange(0.,self._mixture_setup['mixture_duration'],0.1) 40 | self._mixture_setup['nOverlap'] = params['max_polyphony'] 41 | self._nb_frames = len(self._mixture_setup['time_idx_100ms']) 42 | self._rnd_generator = np.random.default_rng() 43 | 44 | self._rirdata = db_config._rirdata 45 | self._nb_classes = len(self._classnames) 46 | self._nb_speeds = len(self._mixture_setup['speed_set']) 47 | self._nb_snrs = len(self._mixture_setup['snr_set']) 48 | self._total_event_time_per_layer = params['event_time_per_layer'] 49 | self._total_silence_time_per_layer = self._mixture_setup['mixture_duration'] - self._total_event_time_per_layer 50 | self._min_gap_len = 1. # in seconds, minimum length of gaps between samples 51 | self._trim_threshold = 3. #in seconds, minimum length under which a trimmed event at end is discarded 52 | self._move_threshold = 3. #in seconds, minimum length over which events can be moving 53 | 54 | def create_mixtures(self): 55 | self._mixtures = [] 56 | foldlist = [] 57 | rirdata2room_idx = {1: 0, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 8: 6, 9: 7, 10: 8} # room numbers in the rirdata array 58 | 59 | for nfold in range(self._mixture_setup['nb_folds']): 60 | print('Generating metadata for fold {}'.format(str(nfold+1))) 61 | 62 | foldlist_nff = {} 63 | rooms_nf = np.array(self._mixture_setup['rooms2folds'][nfold]) 64 | rooms_nf = rooms_nf[rooms_nf>0] 65 | nb_rooms_nf = len(rooms_nf) 66 | 67 | 68 | idx_active = np.array([]) 69 | for na in range(self._nb_active_classes): 70 | idx_active = np.append(idx_active, np.nonzero(self._db_config._samplelist[nfold]['class'] == self._active_classes[na])) 71 | idx_active = idx_active.astype('int') 72 | 73 | foldlist_nff['class'] = self._db_config._samplelist[nfold]['class'][idx_active] 74 | foldlist_nff['audiofile'] = self._db_config._samplelist[nfold]['audiofile'][idx_active] 75 | foldlist_nff['duration'] = self._db_config._samplelist[nfold]['duration'][idx_active] 76 | foldlist_nff['onoffset'] = self._db_config._samplelist[nfold]['onoffset'][idx_active] 77 | nb_samples_nf = len(foldlist_nff['duration']) 78 | 79 | # shuffle randomly the samples in the target list to avoid samples of the same class coming consecutively 80 | 81 | if len(np.shape(foldlist_nff['onoffset'])) == 1: 82 | foldlist_nff['onoffset'] = np.expand_dims(foldlist_nff['onoffset'],axis=1) 83 | foldlist_nf = foldlist_nff 84 | foldlist.append(foldlist_nf) 85 | sampleperm = self._rnd_generator.permutation(nb_samples_nf) 86 | foldlist_nf['class'] = foldlist_nf['class'][sampleperm] 87 | foldlist_nf['audiofile'] = foldlist_nf['audiofile'][sampleperm] 88 | foldlist_nf['duration'] = foldlist_nf['duration'][sampleperm] 89 | foldlist_nf['onoffset'] = foldlist_nf['onoffset'][sampleperm] 90 | room_mixtures = [] 91 | for nr in range(nb_rooms_nf): 92 | fold_mixture = {'mixture': []} 93 | fold_mixture['roomidx'] = rooms_nf 94 | nroom = rooms_nf[nr] 95 | print('Room {} \n'.format(nroom+1)) 96 | n_traj = np.shape(self._rirdata[rirdata2room_idx[nroom]][0][2])[0] #number of trajectories 97 | traj_doas = [] 98 | 99 | for ntraj in range(n_traj): 100 | n_rirs = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,:]) 101 | n_heights = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,:]>0) 102 | all_doas = np.zeros((n_rirs, 3)) 103 | n_rirs_accum = 0 104 | flip = 0 105 | 106 | for nheight in range(n_heights): 107 | n_rirs_nh = self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,nheight] 108 | doa_xyz = self._rirdata[rirdata2room_idx[nroom]][0][2][ntraj,nheight][0] 109 | # stack all doas of trajectory together 110 | # flip the direction of each second height, so that a 111 | # movement can jump from the lower to the higher smoothly and 112 | # continue moving the opposite direction 113 | if flip: 114 | nb_doas = np.shape(doa_xyz)[0] 115 | all_doas[n_rirs_accum + np.arange(n_rirs_nh), :] = doa_xyz[np.flip(np.arange(nb_doas)), :] 116 | else: 117 | all_doas[n_rirs_accum + np.arange(n_rirs_nh), :] = doa_xyz 118 | 119 | n_rirs_accum += n_rirs_nh 120 | flip = not flip 121 | 122 | traj_doas.append(all_doas) 123 | 124 | # start layering the mixtures for the specific room 125 | sample_counter = 0 126 | if np.isscalar(self._nb_mixtures_per_fold): 127 | nb_mixtures_per_fold_per_room = int(np.round(self._nb_mixtures_per_fold / float(nb_rooms_nf))) 128 | else: 129 | nb_mixtures_per_fold_per_room = int(np.round(self._nb_mixtures_per_fold[nfold] / float(nb_rooms_nf))) 130 | 131 | for nmix in range(nb_mixtures_per_fold_per_room): 132 | print('Room {}, mixture {}'.format(nroom+1, nmix+1)) 133 | 134 | event_counter = 0 135 | nth_mixture = {'files': np.array([]), 'class': np.array([]), 'event_onoffsets': np.array([]), 136 | 'sample_onoffsets': np.array([]), 'trajectory': np.array([]), 'isMoving': np.array([]), 'isFlippedMoving': np.array([]), 137 | 'speed': np.array([]), 'rirs': [], 'doa_azel': np.array([],dtype=object)} 138 | nth_mixture['room'] = nroom 139 | nth_mixture['snr'] = self._mixture_setup['snr_set'][self._rnd_generator.integers(0,self._nb_snrs)] 140 | 141 | for layer in range(self._mixture_setup['nOverlap']): 142 | print('Layer {}'.format(layer)) 143 | #zero this flag (explained later) 144 | TRIMMED_SAMPLE_AT_END = 0 145 | 146 | #fetch event samples till they add up to the target event time per layer 147 | event_time_in_layer = 0 148 | event_idx_in_layer = [] 149 | 150 | while event_time_in_layer < self._total_event_time_per_layer: 151 | #get event duration 152 | ev_duration = np.ceil(foldlist_nf['duration'][sample_counter]*10.)/10. 153 | event_time_in_layer += ev_duration 154 | event_idx_in_layer.append(sample_counter) 155 | 156 | event_counter += 1 157 | sample_counter += 1 158 | 159 | if sample_counter == nb_samples_nf: 160 | sample_counter = 0 161 | 162 | # the last sample is going to be trimmed to fit the desired 163 | # time, or omit if it is less than X sec, and occurs later than that time 164 | trimmed_event_length = self._total_event_time_per_layer - (event_time_in_layer - ev_duration) 165 | #Temporary workaround - for some reason for interference classes the dict is packed with an additional dimension - check it 166 | if len(foldlist_nf['onoffset'][event_idx_in_layer[-1]]) == 1: 167 | ons = foldlist_nf['onoffset'][event_idx_in_layer[-1]][0][0,0] if self._db == 'nigens' else foldlist_nf['onoffset'][event_idx_in_layer[-1]][0][0] 168 | else: 169 | ons = foldlist_nf['onoffset'][event_idx_in_layer[-1]][0,0] if self._db == 'nigens' else foldlist_nf['onoffset'][event_idx_in_layer[-1]][0] 170 | if (trimmed_event_length > self._trim_threshold) and (trimmed_event_length > np.floor(ons*10.)/10.): 171 | TRIMMED_SAMPLE_AT_END = 1 172 | else: 173 | if len(event_idx_in_layer) == 1: 174 | raise ValueError("STOP, we will get stuck here forever") 175 | 176 | #remove from sample list 177 | event_idx_in_layer = event_idx_in_layer[:-1] 178 | # reduce sample count and events-in-recording by 1 179 | event_counter -= 1 180 | if sample_counter != 0: 181 | sample_counter -= 1 182 | else: 183 | # move sample counter to end of the list to re-use sample 184 | sample_counter = nb_samples_nf-1 185 | 186 | nb_samples_in_layer = len(event_idx_in_layer) 187 | # split silences between events 188 | # randomize N split points uniformly for N events (in 189 | # steps of 100msec) 190 | mult_silence = np.round(self._total_silence_time_per_layer*10.) 191 | 192 | mult_min_gap_len = np.round(self._min_gap_len*10.) 193 | if nb_samples_in_layer > 1: 194 | 195 | silence_splits = np.sort(self._rnd_generator.integers(1, mult_silence,nb_samples_in_layer-1)) 196 | #force gaps smaller then _min_gap_len to it 197 | gaps = np.diff(np.concatenate(([0],silence_splits,[mult_silence]))) 198 | smallgaps_idx = np.argwhere(gaps[:(nb_samples_in_layer-1)] < mult_min_gap_len) 199 | while np.any(smallgaps_idx): 200 | temp = np.concatenate(([0], silence_splits)) 201 | silence_splits[smallgaps_idx] = temp[smallgaps_idx] + mult_min_gap_len 202 | gaps = np.diff(np.concatenate(([0],silence_splits,[mult_silence]))) 203 | smallgaps_idx = np.argwhere(gaps[:(nb_samples_in_layer-1)] < mult_min_gap_len) 204 | if np.any(gaps < mult_min_gap_len): 205 | min_idx = np.argwhere(gaps < mult_min_gap_len) 206 | gaps[min_idx] = mult_min_gap_len 207 | # if gaps[nb_samples_in_layer-1] < mult_min_gap_len: 208 | # gaps[nb_samples_in_layer-1] = mult_min_gap_len 209 | 210 | else: 211 | gaps = np.array([mult_silence]) 212 | 213 | while np.sum(gaps) > self._total_silence_time_per_layer*10.: 214 | silence_diff = np.sum(gaps) - self._total_silence_time_per_layer*10. 215 | picked_gaps = np.argwhere(gaps > np.mean(gaps)) 216 | eq_subtract = silence_diff / len(picked_gaps) 217 | picked_gaps = np.argwhere((gaps - eq_subtract) > mult_min_gap_len) 218 | gaps[picked_gaps] -= eq_subtract 219 | 220 | # distribute events in timeline 221 | time_idx = 0 222 | for nl in range(nb_samples_in_layer): 223 | #print('Sample {} in layer {}'.format(nl, layer)) 224 | # event offset (quantized to 100ms) 225 | gap_nl = gaps[nl] 226 | time_idx += gap_nl 227 | event_nl = event_idx_in_layer[nl] 228 | event_duration_nl = np.ceil(foldlist_nf['duration'][event_nl]*10.) 229 | event_class_nl = int(foldlist_nf['class'][event_nl]) 230 | if len(foldlist_nf['onoffset'][event_nl]) == 1: 231 | onoffsets = foldlist_nf['onoffset'][event_nl][0] 232 | else: 233 | onoffsets = foldlist_nf['onoffset'][event_nl] 234 | 235 | sample_onoffsets = np.zeros_like(onoffsets) 236 | if self._db == 'nigens': 237 | sample_onoffsets[:, 0] = np.floor(onoffsets[:,0]*10.)/10. 238 | sample_onoffsets[:, 1] = np.floor(onoffsets[:,1]*10.)/10. 239 | #trim event duration if it's the trimmed sample 240 | if (nl == nb_samples_in_layer-1) and TRIMMED_SAMPLE_AT_END: 241 | event_duration_nl = len(self._mixture_setup['time_idx_100ms']) - time_idx - 1 242 | # keep only onset/offsets in the trimmed region 243 | find_last_offset_mtx = (event_duration_nl/10.) > sample_onoffsets 244 | sample_onoffsets = sample_onoffsets[:np.sum(find_last_offset_mtx[:,0]),:] 245 | if sample_onoffsets[-1, 1] > event_duration_nl/10.: 246 | sample_onoffsets[-1, 1] = event_duration_nl/10. 247 | else: 248 | sample_onoffsets = np.floor(onoffsets*10.)/10. 249 | #trim event duration if it's the trimmed sample 250 | if (nl == nb_samples_in_layer-1) and TRIMMED_SAMPLE_AT_END: 251 | event_duration_nl = len(self._mixture_setup['time_idx_100ms']) - time_idx - 1 252 | # keep only onset/offsets in the trimmed region 253 | if sample_onoffsets[1] > event_duration_nl/10.: 254 | sample_onoffsets[1] = event_duration_nl/10. 255 | 256 | # trajectory 257 | ev_traj = self._rnd_generator.integers(0, n_traj) 258 | nRirs = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ev_traj,:]) 259 | 260 | #if event is less than move_threshold long, make it static by default 261 | if event_duration_nl <= self._move_threshold*10: 262 | is_moving = 0 263 | else: 264 | if self._class_mobility[event_class_nl] == 2: 265 | # randomly moving or static 266 | is_moving = self._rnd_generator.integers(0,2) 267 | else: 268 | # only static or moving depending on class 269 | is_moving = self._class_mobility[event_class_nl] 270 | 271 | if is_moving: 272 | ev_nspeed = self._rnd_generator.integers(0,self._nb_speeds) 273 | ev_speed = self._mixture_setup['speed_set'][ev_nspeed] 274 | # check if with the current speed there are enough 275 | # RIRs in the trajectory to move through the full 276 | # duration of the event, otherwise, lower speed 277 | while len(np.arange(0,nRirs,ev_speed/10)) <= event_duration_nl: 278 | ev_nspeed = ev_nspeed-1 279 | if ev_nspeed == -1: 280 | break 281 | 282 | ev_speed = self._mixture_setup['speed_set'][ev_nspeed] 283 | 284 | is_flipped_moving = self._rnd_generator.integers(0,2) 285 | event_span_nl = event_duration_nl * ev_speed / 10. 286 | 287 | if is_flipped_moving: 288 | # sample length is shorter than all the RIRs 289 | # in the moving trajectory 290 | if ev_nspeed+1: 291 | end_idx = event_span_nl + self._rnd_generator.integers(0, nRirs-event_span_nl+1) 292 | start_idx = end_idx - event_span_nl 293 | riridx = start_idx + np.arange(0, event_span_nl, dtype=int) 294 | riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)] #pick every nth RIR based on speed 295 | riridx = np.flip(riridx) 296 | else: 297 | riridx = np.arange(event_span_nl,0,-1)-1 298 | riridx = riridx - (event_span_nl-nRirs) 299 | riridx = riridx[np.arange(0, len(riridx), ev_speed/10, dtype=int)] 300 | riridx[riridx<0] = 0 301 | else: 302 | if ev_nspeed+1: 303 | start_idx = self._rnd_generator.integers(0, nRirs-event_span_nl+1) 304 | riridx = start_idx + np.arange(0,event_span_nl,dtype=int) - 1 305 | riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)] 306 | else: 307 | riridx = np.arange(0,event_span_nl) 308 | riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)] 309 | riridx[riridx>nRirs-1] = nRirs-1 310 | else: 311 | is_flipped_moving = 0 312 | ev_speed = 0 313 | riridx = np.array([self._rnd_generator.integers(0,nRirs)]) 314 | riridx = riridx.astype('int') 315 | 316 | if nl == 0 and layer==0: 317 | nth_mixture['event_onoffsets'] = np.array([[time_idx/10., (time_idx+event_duration_nl)/10.]]) 318 | nth_mixture['doa_azel'] = [cart2sph(traj_doas[ev_traj][riridx,:])] 319 | nth_mixture['sample_onoffsets'] = [sample_onoffsets] 320 | else: 321 | nth_mixture['event_onoffsets'] = np.vstack((nth_mixture['event_onoffsets'], np.array([time_idx/10., (time_idx+event_duration_nl)/10.]))) 322 | nth_mixture['doa_azel'].append(cart2sph(traj_doas[ev_traj][riridx,:])) 323 | nth_mixture['sample_onoffsets'].append(sample_onoffsets) 324 | 325 | nth_mixture['files'] = np.append(nth_mixture['files'], foldlist_nf['audiofile'][event_nl]) 326 | nth_mixture['class'] = np.append(nth_mixture['class'], self._class2activeClassmap[int(foldlist_nf['class'][event_nl])]) 327 | nth_mixture['trajectory'] = np.append(nth_mixture['trajectory'], ev_traj) 328 | nth_mixture['isMoving'] = np.append(nth_mixture['isMoving'], is_moving) 329 | nth_mixture['isFlippedMoving'] = np.append(nth_mixture['isFlippedMoving'], is_flipped_moving) 330 | nth_mixture['speed'] = np.append(nth_mixture['speed'], ev_speed) 331 | nth_mixture['rirs'].append(riridx) 332 | 333 | 334 | time_idx += event_duration_nl 335 | 336 | # sort overlapped events by temporal appearance 337 | sort_idx = np.argsort(nth_mixture['event_onoffsets'][:,0]) 338 | nth_mixture['files'] = nth_mixture['files'][sort_idx] 339 | nth_mixture['class'] = nth_mixture['class'][sort_idx] 340 | nth_mixture['event_onoffsets'] = nth_mixture['event_onoffsets'][sort_idx] 341 | #nth_mixture['sample_onoffsets'] = nth_mixture['sample_onoffsets'][sort_idx] 342 | nth_mixture['trajectory'] = nth_mixture['trajectory'][sort_idx] 343 | nth_mixture['isMoving'] = nth_mixture['isMoving'][sort_idx] 344 | nth_mixture['isFlippedMoving'] = nth_mixture['isFlippedMoving'][sort_idx] 345 | nth_mixture['speed'] = nth_mixture['speed'][sort_idx] 346 | nth_mixture['rirs'] = np.array(nth_mixture['rirs'],dtype=object) 347 | nth_mixture['rirs'] = nth_mixture['rirs'][sort_idx] 348 | new_doas = np.zeros(len(sort_idx),dtype=object) 349 | new_sample_onoffsets = np.zeros(len(sort_idx),dtype=object) 350 | upd_idx = 0 351 | for idx in sort_idx: 352 | new_doas[upd_idx] = nth_mixture['doa_azel'][idx].T 353 | new_sample_onoffsets[upd_idx] = nth_mixture['sample_onoffsets'][idx] 354 | upd_idx += 1 355 | nth_mixture['doa_azel'] = new_doas 356 | nth_mixture['sample_onoffsets'] = new_sample_onoffsets 357 | 358 | #accumulate mixtures for each room 359 | fold_mixture['mixture'].append(nth_mixture) 360 | #accumulate rooms 361 | room_mixtures.append(fold_mixture) 362 | #accumulate mixtures per fold 363 | self._mixtures.append(room_mixtures) 364 | 365 | 366 | return self._mixtures, self._mixture_setup, foldlist 367 | 368 | def prepare_metadata_and_stats(self): 369 | print('Calculate statistics and prepate metadata') 370 | stats = [] 371 | self._metadata = [] 372 | stats = {} 373 | stats['nFrames_total'] = self._mixture_setup['nb_folds'] * self._nb_mixtures_per_fold * self._nb_frames if np.isscalar(self._nb_mixtures_per_fold) else np.sum(self._nb_mixtures_per_fold) * self._nb_frames 374 | stats['class_multi_instance'] = np.zeros(self._nb_classes) 375 | stats['class_instances'] = np.zeros(self._nb_classes) 376 | stats['class_nEvents'] = np.zeros(self._nb_classes) 377 | stats['class_presence'] = np.zeros(self._nb_classes) 378 | 379 | stats['polyphony'] = np.zeros(self._mixture_setup['nOverlap']+1) 380 | stats['event_presence'] = 0 381 | stats['nEvents_total'] = 0 382 | stats['nEvents_static'] = 0 383 | stats['nEvents_moving'] = 0 384 | 385 | for nfold in range(self._mixture_setup['nb_folds']): 386 | print('Statistics and metadata for fold {}'.format(nfold+1)) 387 | rooms = self._mixtures[nfold][0]['roomidx'] 388 | nb_rooms = len(rooms) 389 | room_mixtures=[] 390 | for nr in range(nb_rooms): 391 | nb_mixtures = len(self._mixtures[nfold][nr]['mixture']) 392 | per_room_mixtures = [] 393 | for nmix in range(nb_mixtures): 394 | mixture = {'classid': np.array([]), 'trackid': np.array([]), 'eventtimetracks': np.array([]), 'eventdoatimetracks': np.array([])} 395 | mixture_nm = self._mixtures[nfold][nr]['mixture'][nmix] 396 | event_classes = mixture_nm['class'] 397 | event_states = mixture_nm['isMoving'] 398 | 399 | #idx of events and interferers 400 | nb_events = len(event_classes) 401 | nb_events_moving = np.sum(event_states) 402 | stats['nEvents_total'] += nb_events 403 | stats['nEvents_static'] += nb_events - nb_events_moving 404 | stats['nEvents_moving'] += nb_events_moving 405 | 406 | # number of events per class 407 | for nc in range(self._mixture_setup['nb_classes']): 408 | nb_class_events = np.sum(event_classes == nc) 409 | stats['class_nEvents'][nc] += nb_class_events 410 | 411 | # store a timeline for each event 412 | eventtimetracks = np.zeros((self._nb_frames, nb_events)) 413 | eventdoatimetracks = np.nan*np.ones((self._nb_frames, 2, nb_events)) 414 | 415 | #prepare metadata for synthesis 416 | for nev in range(nb_events): 417 | event_onoffset = mixture_nm['event_onoffsets'][nev,:]*10 418 | doa_azel = np.round(mixture_nm['doa_azel'][nev]) 419 | #zero the activity according to perceptual onsets/offsets 420 | sample_onoffsets = mixture_nm['sample_onoffsets'][nev] 421 | ev_idx = np.arange(event_onoffset[0], event_onoffset[1]+0.1,dtype=int) 422 | activity_mask = np.zeros(len(ev_idx),dtype=int) 423 | sample_shape = np.shape(sample_onoffsets) 424 | if len(sample_shape) == 1: 425 | activity_mask[np.arange(int(np.round(sample_onoffsets[0]*10)),int(np.round(sample_onoffsets[1]*10)))] = 1 426 | else: 427 | for nseg in range(sample_shape[0]): 428 | ran = np.arange(int(np.round(sample_onoffsets[nseg,0]*10)),int(np.round((sample_onoffsets[nseg,1])*10))) 429 | activity_mask[ran] = 1 430 | 431 | if len(activity_mask) > len(ev_idx): 432 | activity_mask = activity_mask[0:len(ev_idx)] 433 | 434 | if np.shape(doa_azel)[0] == 1: 435 | # static event 436 | try: 437 | eventtimetracks[ev_idx, nev] = activity_mask 438 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],0,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,0] 439 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],1,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,1] 440 | except IndexError: 441 | excess_idx = len(np.argwhere(ev_idx >= self._nb_frames)) 442 | ev_idx = ev_idx[:-excess_idx] 443 | if len(activity_mask) > len(ev_idx): 444 | activity_mask = activity_mask[0:len(ev_idx)] 445 | eventtimetracks[ev_idx, nev] = activity_mask 446 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],0,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,0] 447 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],1,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,1] 448 | 449 | else: 450 | # moving event 451 | nb_doas = np.shape(doa_azel)[0] 452 | ev_idx = ev_idx[:nb_doas] 453 | activity_mask = activity_mask[:nb_doas] 454 | try: 455 | eventtimetracks[ev_idx,nev] = activity_mask 456 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],:,nev] = doa_azel[activity_mask.astype(bool),:] 457 | except IndexError: 458 | excess_idx = len(np.argwhere(ev_idx >= self._nb_frames)) 459 | ev_idx = ev_idx[:-excess_idx] 460 | if len(activity_mask) > len(ev_idx): 461 | activity_mask = activity_mask[0:len(ev_idx)] 462 | eventtimetracks[ev_idx,nev] = activity_mask 463 | eventdoatimetracks[ev_idx[activity_mask.astype(bool)],:,nev] = doa_azel[activity_mask.astype(bool),:] 464 | 465 | mixture['classid'] = event_classes 466 | mixture['trackid'] = np.arange(0,nb_events) 467 | mixture['eventtimetracks'] = eventtimetracks 468 | mixture['eventdoatimetracks'] = eventdoatimetracks 469 | 470 | for nf in range(self._nb_frames): 471 | # find active events 472 | active_events = np.argwhere(eventtimetracks[nf,:] > 0) 473 | # find the classes of the active events 474 | active_classes = event_classes[active_events] 475 | 476 | if not active_classes.ndim and active_classes.size: 477 | # add to zero polyphony 478 | stats['polyphony'][0] += 1 479 | else: 480 | # add to general event presence 481 | stats['event_presence'] += 1 482 | # number of simultaneous events 483 | nb_active = len(active_events) 484 | 485 | # add to respective polyphony 486 | try: 487 | stats['polyphony'][nb_active] += 1 488 | except IndexError: 489 | pass #TODO: this is a workaround for less than 1% border cases, needs to be fixed although not very relevant 490 | 491 | # presence, instances and multi-instance for each class 492 | 493 | for nc in range(self._mixture_setup['nb_classes']): 494 | nb_instances = np.sum(active_classes == nc) 495 | if nb_instances > 0: 496 | stats['class_presence'][nc] += 1 497 | if nb_instances > 1: 498 | stats['class_multi_instance'][nc] += 1 499 | stats['class_instances'][nc] += nb_instances 500 | per_room_mixtures.append(mixture) 501 | room_mixtures.append(per_room_mixtures) 502 | self._metadata.append(room_mixtures) 503 | 504 | # compute average polyphony 505 | weighted_polyphony_sum = 0 506 | for nn in range(self._mixture_setup['nOverlap']): 507 | weighted_polyphony_sum += nn * stats['polyphony'][nn+1] 508 | 509 | stats['avg_polyphony'] = weighted_polyphony_sum / stats['event_presence'] 510 | 511 | #event percentages 512 | stats['class_event_pc'] = np.round(stats['class_nEvents']*1000./stats['nEvents_total'])/10. 513 | stats['event_presence_pc'] = np.round(stats['event_presence']*1000./stats['nFrames_total'])/10. 514 | stats['class_presence_pc'] = np.round(stats['class_presence']*1000./stats['nFrames_total'])/10. 515 | # percentage of frames with same-class instances 516 | stats['multi_class_pc'] = np.round(np.sum(stats['class_multi_instance']*1000./stats['nFrames_total']))/10. 517 | 518 | 519 | return self._metadata, stats 520 | 521 | def write_metadata(self): 522 | if not os.path.isdir(self._metadata_path): 523 | os.makedirs(self._metadata_path) 524 | 525 | for nfold in range(self._mixture_setup['nb_folds']): 526 | print('Writing metadata files for fold {}'.format(nfold+1)) 527 | nb_rooms = len(self._metadata[nfold]) 528 | for nr in range(nb_rooms): 529 | nb_mixtures = len(self._metadata[nfold][nr]) 530 | for nmix in range(nb_mixtures): 531 | print('Mixture {}'.format(nmix)) 532 | metadata_nm = self._metadata[nfold][nr][nmix] 533 | 534 | # write to filename, omitting non-active frames 535 | mixture_filename = 'fold{}_room{}_mix{:03}.csv'.format(nfold+1, nr+1, nmix+1) 536 | file_id = open(self._metadata_path + '/' + mixture_filename, 'w', newline="") 537 | metadata_writer = csv.writer(file_id,delimiter=',',quoting = csv.QUOTE_NONE) 538 | for nf in range(self._nb_frames): 539 | # find active events 540 | active_events = np.argwhere(metadata_nm['eventtimetracks'][nf, :]>0) 541 | nb_active = len(active_events) 542 | 543 | if nb_active > 0: 544 | # find the classes of active events 545 | active_classes = metadata_nm['classid'][active_events] 546 | active_tracks = metadata_nm['trackid'][active_events] 547 | 548 | # write to file 549 | for na in range(nb_active): 550 | classidx = int(active_classes[na][0]) #additional zero index since it's packed in an array 551 | trackidx = int(active_tracks[na][0]) 552 | 553 | azim = int(metadata_nm['eventdoatimetracks'][nf,0,active_events][na][0]) 554 | elev = int(metadata_nm['eventdoatimetracks'][nf,1,active_events][na][0]) 555 | metadata_writer.writerow([nf,classidx,trackidx,azim,elev]) 556 | file_id.close() 557 | 558 | 559 | 560 | 561 | 562 | 563 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy 3 | 4 | 5 | def sample_from_quartiles(K, stats): 6 | minn = stats[0] 7 | maxx = stats[4] 8 | quart1 = stats[1] 9 | mediann = stats[2] 10 | quart3 = stats[3] 11 | samples = minn + (quart1 - minn)*np.random.rand(K, 1) 12 | samples = np.append(samples,quart1) 13 | samples = np.append(samples, quart1 + (mediann-quart1)*np.random.rand(K,1)) 14 | samples = np.append(samples,mediann) 15 | samples = np.append(samples, mediann + (quart3-mediann)*np.random.rand(K,1)) 16 | samples = np.append(samples, quart3) 17 | samples = np.append(samples, quart3 + (maxx-quart3)*np.random.rand(K,1)) 18 | 19 | return samples 20 | 21 | def cart2sph(xyz): 22 | return_list = False 23 | if len(np.shape(xyz)) == 2: 24 | return_list = True 25 | x = xyz[:, 0] 26 | y = xyz[:, 1] 27 | z = xyz[:, 2] 28 | else: 29 | x = xyz[0] 30 | y = xyz[1] 31 | z = xyz[2] 32 | 33 | azimuth = np.arctan2(y, x) * 180. / np.pi 34 | elevation = np.arctan2(z, np.sqrt(x**2 + y**2)) * 180. / np.pi 35 | if return_list: 36 | return np.stack((azimuth,elevation),axis=0) 37 | else: 38 | return np.array([azimuth, elevation]) 39 | 40 | 41 | def stft_ham(insig, winsize=256, fftsize=512, hopsize=128): 42 | nb_dim = len(np.shape(insig)) 43 | lSig = int(np.shape(insig)[0]) 44 | nCHin = int(np.shape(insig)[1]) if nb_dim > 1 else 1 45 | x = np.arange(0,winsize) 46 | nBins = int(fftsize/2 + 1) 47 | nWindows = int(np.ceil(lSig/(2.*hopsize))) 48 | nFrames = int(2*nWindows+1) 49 | 50 | winvec = np.zeros((len(x),nCHin)) 51 | for i in range(nCHin): 52 | winvec[:,i] = np.sin(x*(np.pi/winsize))**2 53 | 54 | frontpad = winsize-hopsize 55 | backpad = nFrames*hopsize-lSig 56 | 57 | if nb_dim > 1: 58 | insig_pad = np.pad(insig,((frontpad,backpad),(0,0)),'constant') 59 | spectrum = np.zeros((nBins, nFrames, nCHin),dtype='complex') 60 | else: 61 | insig_pad = np.pad(insig,((frontpad,backpad)),'constant') 62 | spectrum = np.zeros((nBins, nFrames),dtype='complex') 63 | 64 | idx=0 65 | nf=0 66 | if nb_dim > 1: 67 | while nf <= nFrames-1: 68 | insig_win = np.multiply(winvec, insig_pad[idx+np.arange(0,winsize),:]) 69 | inspec = scipy.fft.fft(insig_win,n=fftsize,norm='backward',axis=0) 70 | #inspec = scipy.fft.fft(insig_win,n=fftsize,axis=0) 71 | inspec=inspec[:nBins,:] 72 | spectrum[:,nf,:] = inspec 73 | idx += hopsize 74 | nf += 1 75 | else: 76 | while nf <= nFrames-1: 77 | insig_win = np.multiply(winvec[:,0], insig_pad[idx+np.arange(0,winsize)]) 78 | inspec = scipy.fft.fft(insig_win,n=fftsize,norm='backward',axis=0) 79 | #inspec = scipy.fft.fft(insig_win,n=fftsize,axis=0) 80 | inspec=inspec[:nBins] 81 | spectrum[:,nf] = inspec 82 | idx += hopsize 83 | nf += 1 84 | 85 | return spectrum 86 | 87 | 88 | def ctf_ltv_direct(sig, irs, ir_times, fs, win_size): 89 | convsig = [] 90 | win_size = int(win_size) 91 | hop_size = int(win_size / 2) 92 | fft_size = win_size*2 93 | nBins = int(fft_size/2)+1 94 | 95 | # IRs 96 | ir_shape = np.shape(irs) 97 | sig_shape = np.shape(sig) 98 | 99 | lIr = ir_shape[0] 100 | 101 | if len(ir_shape) == 2: 102 | nIrs = ir_shape[1] 103 | nCHir = 1 104 | elif len(ir_shape) == 3: 105 | nIrs = ir_shape[2] 106 | nCHir = ir_shape[1] 107 | 108 | if nIrs != len(ir_times): 109 | return ValueError('Bad ir times') 110 | 111 | # number of STFT frames for the IRs (half-window hopsize) 112 | 113 | nIrWindows = int(np.ceil(lIr/win_size)) 114 | nIrFrames = 2*nIrWindows+1 115 | # number of STFT frames for the signal (half-window hopsize) 116 | lSig = sig_shape[0] 117 | nSigWindows = np.ceil(lSig/win_size) 118 | nSigFrames = 2*nSigWindows+1 119 | 120 | # quantize the timestamps of each IR to multiples of STFT frames (hopsizes) 121 | tStamps = np.round((ir_times*fs+hop_size)/hop_size) 122 | 123 | # create the two linear interpolator tracks, for the pairs of IRs between timestamps 124 | nIntFrames = int(tStamps[-1]) 125 | Gint = np.zeros((nIntFrames, nIrs)) 126 | for ni in range(nIrs-1): 127 | tpts = np.arange(tStamps[ni],tStamps[ni+1]+1,dtype=int)-1 128 | ntpts = len(tpts) 129 | ntpts_ratio = np.arange(0,ntpts)/(ntpts-1) 130 | Gint[tpts,ni] = 1-ntpts_ratio 131 | Gint[tpts,ni+1] = ntpts_ratio 132 | 133 | # compute spectra of irs 134 | 135 | if nCHir == 1: 136 | irspec = np.zeros((nBins, nIrFrames, nIrs),dtype=complex) 137 | else: 138 | temp_spec = stft_ham(irs[:, :, 0], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2) 139 | irspec = np.zeros((nBins, np.shape(temp_spec)[1], nCHir, nIrs),dtype=complex) 140 | 141 | for ni in range(nIrs): 142 | if nCHir == 1: 143 | irspec[:, :, ni] = stft_ham(irs[:, ni], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2) 144 | else: 145 | spec = stft_ham(irs[:, :, ni], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2) 146 | irspec[:, :, :, ni] = spec#np.transpose(spec, (0, 2, 1)) 147 | 148 | #compute input signal spectra 149 | sigspec = stft_ham(sig, winsize=win_size,fftsize=2*win_size,hopsize=win_size//2) 150 | #initialize interpolated time-variant ctf 151 | Gbuf = np.zeros((nIrFrames, nIrs)) 152 | if nCHir == 1: 153 | ctf_ltv = np.zeros((nBins, nIrFrames),dtype=complex) 154 | else: 155 | ctf_ltv = np.zeros((nBins,nIrFrames,nCHir),dtype=complex) 156 | 157 | S = np.zeros((nBins, nIrFrames),dtype=complex) 158 | 159 | #processing loop 160 | idx = 0 161 | nf = 0 162 | inspec_pad = sigspec 163 | nFrames = int(np.min([np.shape(inspec_pad)[1], nIntFrames])) 164 | 165 | convsig = np.zeros((win_size//2 + nFrames*win_size//2 + fft_size-win_size, nCHir)) 166 | 167 | while nf <= nFrames-1: 168 | #compute interpolated ctf 169 | Gbuf[1:, :] = Gbuf[:-1, :] 170 | Gbuf[0, :] = Gint[nf, :] 171 | if nCHir == 1: 172 | for nif in range(nIrFrames): 173 | ctf_ltv[:, nif] = np.matmul(irspec[:,nif,:], Gbuf[nif,:].astype(complex)) 174 | else: 175 | for nch in range(nCHir): 176 | for nif in range(nIrFrames): 177 | ctf_ltv[:,nif,nch] = np.matmul(irspec[:,nif,nch,:],Gbuf[nif,:].astype(complex)) 178 | inspec_nf = inspec_pad[:, nf] 179 | S[:,1:nIrFrames] = S[:, :nIrFrames-1] 180 | S[:, 0] = inspec_nf 181 | 182 | repS = np.tile(np.expand_dims(S,axis=2), [1, 1, nCHir]) 183 | convspec_nf = np.squeeze(np.sum(repS * ctf_ltv,axis=1)) 184 | first_dim = np.shape(convspec_nf)[0] 185 | convspec_nf = np.vstack((convspec_nf, np.conj(convspec_nf[np.arange(first_dim-1, 1, -1)-1,:]))) 186 | convsig_nf = np.real(scipy.fft.ifft(convspec_nf, fft_size, norm='forward', axis=0)) ## get rid of the imaginary numerical error remain 187 | # convsig_nf = np.real(scipy.fft.ifft(convspec_nf, fft_size, axis=0)) 188 | #overlap-add synthesis 189 | convsig[idx+np.arange(0,fft_size),:] += convsig_nf 190 | #advance sample pointer 191 | idx += hop_size 192 | nf += 1 193 | 194 | convsig = convsig[(win_size):(nFrames*win_size)//2,:] 195 | 196 | return convsig --------------------------------------------------------------------------------