├── LICENSE.md
├── README.md
├── audio_mixer.py
├── audio_synthesizer.py
├── db_config.py
├── db_config_fsd.obj
├── db_config_nigens.obj
├── example_script_DCASE2021.py
├── example_script_DCASE2022.py
├── generation_parameters.py
├── make_dataset.py
├── metadata_synthesizer.py
└── utils.py


/LICENSE.md:
--------------------------------------------------------------------------------
 1 | -----------COPYRIGHT NOTICE STARTS WITH THIS LINE------------
 2 | Copyright (c) 2021 Tampere University and its licensors
 3 | All rights reserved.
 4 | 
 5 | Permission is hereby granted, without written agreement and without
 6 | license or royalty fees, to use and copy the code for the Sound Event
 7 | Localization and Detection using Convolutional Recurrent Neural Network
 8 | method/architecture, present in the GitHub repository with the handle
 9 | seld-dcase2021, (“Work”) described in the paper with title "Sound event
10 | localization and detection of overlapping sources using 
11 | convolutional recurrent neural network" and composed of files with
12 | code in the Python programming language. This grant is only for experimental and
13 | non-commercial purposes, provided that the copyright notice in its entirety
14 | appear in all copies of this Work, and the original source of this Work,
15 | Audio Research Group at Tampere University, is acknowledged in any publication
16 | that reports research using this Work.
17 | 
18 | Any commercial use of the Work or any part thereof is strictly prohibited.
19 | Commercial use include, but is not limited to:
20 | - selling or reproducing the Work
21 | - selling or distributing the results or content achieved by use of the Work
22 | - providing services by using the Work.
23 | 
24 | IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO
25 | ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
26 | ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE
27 | UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY
28 | OF SUCH DAMAGE.
29 | 
30 | TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS
31 | ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
32 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER
33 | IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION
34 | TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
35 | 
36 | -----------COPYRIGHT NOTICE ENDS WITH THIS LINE------------
37 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # DCASE2022-data-generator
 2 | Data generator for creating synthetic audio mixtures suitable for DCASE Challenge 2022 Task 3
 3 | 
 4 | ### Prerequisites
 5 | 
 6 | The provided code was tested with Python 3.8 and the following libraries:
 7 | SoundFile 0.10.3, mat73 0.58, numpy 1.20.1, scipy 1.6.2, librosa 0.8.1. 
 8 | 
 9 | ## Getting Started
10 | 
11 | This repository contains several Python file, which in total create a complete data generation framework.
12 | * The `generation_parameters.py` is a separate script used for setting the parameters for the data generation process, including things such as audio dataset, number of folds, mixuture length, etc.
13 | * The `db_config.py` is a class for containing audio filelists and data parameters from different audio datasets used for the mixture generation.
14 | * The `metadata_synthesizer.py` is a class for generating the mixture target labels, along with the corresponding metadata and statistics. Information from this class can be further used for synthesizing the final audios.
15 | * The `audio_synthesizer.py` is a class for synthesizing noiseless audio files containing the simulated mixtures.
16 | * The `audio_mixer.py` is a class for mixing the generated audio mixtures with background noise and/or interference mixtures.
17 | * The `make_dataset.py` is the main script in which the whole framework is used to perform the full data generation process.
18 | * The `utils.py` is an additional file containing complementary functions for other scripts.
19 | 
20 | Moreover, two object files are included in case the database configuration via `db_config.py` takes too much time:
21 | * The `db_config_fsd.obj` is a DBConfig class containing information about the database and files for the FSD50K audioset.
22 | * The `db_config_nigens.obj` is a DBConfig class containing information about the database and files for the NIGENS audioset.
23 | 
24 | Two exemplary scripts are added:
25 | * The `example_script_DCASE2021.py` is a script showing a pipeline to generate data similar to the DCASE2021 dataset.
26 | * The `example_script_DCASE2022.py` is a script showing a pipeline to generate data similar to the current DCASE2022 dataset.
27 | 
28 | The repository is licensed under the [TAU License](LICENSE.md).
29 | 


--------------------------------------------------------------------------------
/audio_mixer.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import os
  3 | import soundfile
  4 | 
  5 | class AudioMixer(object):
  6 |     def __init__(
  7 |             self, params, db_config, mixtures, mixture_setup, audio_format, scenario_out, scenario_interf='target_interf'
  8 |             ):
  9 |         self._recpath2020 = params['noisepath']
 10 |         self._rooms_paths2020 = ['01_bomb_center','02_gym','03_pb132_paatalo_classroom2','04_pc226_paatalo_office',
 11 |                                   '05_sa203_sahkotalo_lecturehall','06_sc203_sahkotalo_classroom2','07_se201_sahkotalo_classroom',
 12 |                                   '08_se203_sahkotalo_classroom','09_tb103_tietotalo_lecturehall',
 13 |                                   '10_tc352_tietotalo_meetingroom']
 14 |         self._nb_rooms2020 = len(self._rooms_paths2020)
 15 |         self._recpath2019 = params['noisepath']
 16 |         self._rooms_paths2019 = ['11_language_center','12_tietotalo','13_reaktori','14_sahkotalo','15_festia']
 17 |         self._nb_rooms2019 = len(self._rooms_paths2019)        
 18 |         self._mixturepath = params['mixturepath']
 19 |         self._mixtures = mixtures
 20 |         self._targetpath = self._mixturepath + '/' + mixture_setup['scenario']
 21 |         if scenario_out == 'target_noisy':
 22 |             self._scenarios = [1, 0, 1]
 23 |         elif scenario_out == 'target_interf_noiseless':
 24 |             self._scenarios = [1, 1, 0]
 25 |         elif scenario_out == 'target_interf_noisy':
 26 |             self._scenarios = [1, 1, 1]
 27 |         else:
 28 |             raise ValueError('Incorrect scenario specified')
 29 |         
 30 |         self._scenariopath = self._mixturepath + '/' + scenario_out
 31 |         self._audio_format = audio_format
 32 |         if self._audio_format == 'mic':
 33 |             self._mic_format = 'tetra'
 34 |         elif self._audio_format == 'foa':
 35 |             self._mic_format = 'foa_sn3d'
 36 | 
 37 |         if self._scenarios[1]:
 38 |             self._interfpath = self._mixturepath + '/' + scenario_interf
 39 |         self._fs_mix = mixture_setup['fs_mix']
 40 |         self._tMix = mixture_setup['mixture_duration']
 41 |         self._lMix = int(np.round(self._fs_mix*self._tMix))
 42 |         # target signal-to-interference power ratio
 43 |         #  set at 3 now of all targets, so that the total interference power is 
 44 |         #approximately 0dB wrt. to a single layer of targets (for 3 target layers)
 45 |         self._sir = 3.
 46 |         self._nb_folds = mixture_setup['nb_folds']
 47 |         self._rnd_generator = np.random.default_rng(2024)
 48 | 
 49 |     def mix_audio(self):
 50 |         if not os.path.isdir(self._scenariopath + '/' + self._audio_format):
 51 |             os.makedirs(self._scenariopath + '/' + self._audio_format)
 52 |         # start creating the mixtures description structure
 53 |         for nfold in range(self._nb_folds):
 54 |             print('Adding noise for fold {}'.format(nfold+1))
 55 |             rooms = self._mixtures[nfold][0]['roomidx']
 56 |             nb_rooms = len(rooms)
 57 |             for nr in range(nb_rooms):
 58 |                 nroom = rooms[nr]
 59 |                 
 60 |                 if self._scenarios[2]:
 61 |                     print('Loading ambience')
 62 |                     recpath = self._recpath2020 if nroom <=10 else self._recpath2019
 63 |                     roompath = self._rooms_paths2020 if nroom <= 10 else self._rooms_paths2019
 64 |                     roomidx = nroom if nroom <= 10 else nroom-10
 65 |                     ambience, _ = soundfile.read(recpath  + '/' + roompath[roomidx-1] + '/ambience_' + self._mic_format + '_24k_edited.wav')
 66 |                     lSig = np.shape(ambience)[0]
 67 |                     nSegs = np.floor(lSig/self._lMix)
 68 |                 
 69 |                 nb_mixtures = len(self._mixtures[nfold][nr]['mixture'])
 70 |                 for nmix in range(nb_mixtures):
 71 |                     print('Loading target mixture {}/{} \n'.format(nmix+1,nb_mixtures))
 72 |                     mixture_filename = 'fold{}_room{}_mix{:03}.wav'.format(nfold+1,nr+1,nmix+1)
 73 |                     snr = self._mixtures[nfold][nr]['mixture'][nmix]['snr']
 74 |                     
 75 |                     target_sig, _ = soundfile.read(self._targetpath + '/' + self._audio_format + '/' + mixture_filename)
 76 |                     target_omni_energy = np.sum(np.mean(target_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(target_sig[:,0]**2)
 77 |                     
 78 |                     if self._scenarios[1]:
 79 |                         print('Loading interferer mixture {}/{} \n'.format(nmix+1, nb_mixtures))
 80 |                         interf_sig, _ = soundfile.read(self._interfpath + '/' + self._audio_format + '/' + mixture_filename)
 81 |                         inter_omni_energy = np.sum(np.mean(interf_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(interf_sig[:,0]**2)
 82 |                         interf_norm = np.sqrt(inter_omni_energy/(self._sir * inter_omni_energy))
 83 |                         ## ADD INTERFERENCE 
 84 |                         target_sig += interf_norm * interf_sig
 85 |                     
 86 |                     if self._scenarios[2]:
 87 |                         # check if the number of mixture is lower than the duration of the noise recordings
 88 |                         idx_range = np.arange(0,self._lMix,dtype=int) # computed here for convenience
 89 |                         if nmix < nSegs:
 90 |                             ambient_sig = ambience[nmix*self._lMix+idx_range, :]
 91 |                         else:
 92 |                             # else just mix randomly two segments
 93 |                             rand_idx = self._rnd_generator.integers(0,nSegs,2)
 94 |                             ambient_sig = ambience[rand_idx[0]*self._lMix+ idx_range, :]/np.sqrt(2) 
 95 |                             + ambience[rand_idx[1]*self._lMix + idx_range, :]/np.sqrt(2)
 96 |                             
 97 |                         ambi_energy = np.sum(np.mean(ambient_sig,axis=1)**2) if self._audio_format == 'mic' else np.sum(ambient_sig[:,0]**2)
 98 |                         ambi_norm = np.sqrt(target_omni_energy * 10.**(-snr/10.) / ambi_energy)
 99 |                         
100 |                         ## ADD NOISE 
101 |                         target_sig += ambi_norm * ambient_sig
102 |                     
103 |                     
104 |                     soundfile.write(self._scenariopath+'/'+self._audio_format+'/'+mixture_filename, target_sig, self._fs_mix)
105 |                     
106 |                     
107 | 


--------------------------------------------------------------------------------
/audio_synthesizer.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.io
  3 | import utils
  4 | import os
  5 | import mat73
  6 | import scipy.signal as signal
  7 | import soundfile
  8 | 
  9 | class AudioSynthesizer(object):
 10 |     def __init__(
 11 |             self, params, mixtures, mixture_setup, db_config, audio_format
 12 |             ):
 13 |         self._mixtures = mixtures
 14 |         self._rirpath = params['rirpath']
 15 |         self._db_path = params['db_path']
 16 |         self._audio_format = audio_format
 17 |         self._outpath = params['mixturepath'] + '/' + mixture_setup['scenario'] + '/' + self._audio_format
 18 |         self._rirdata = db_config._rirdata
 19 |         self._nb_rooms = len(self._rirdata)
 20 |         self._room_names = []
 21 |         for nr in range(self._nb_rooms):
 22 |             self._room_names.append(self._rirdata[nr][0][0][0])
 23 |         self._classnames = mixture_setup['classnames']
 24 |         self._fs_mix = mixture_setup['fs_mix']
 25 |         self._t_mix = mixture_setup['mixture_duration']
 26 |         self._l_mix = int(np.round(self._fs_mix * self._t_mix))
 27 |         self._time_idx100 = np.arange(0., self._t_mix, 0.1)
 28 |         self._stft_winsize_moving = 0.1*self._fs_mix//2
 29 |         self._nb_folds = len(mixtures)
 30 |         self._apply_event_gains = db_config._apply_class_gains
 31 |         if self._apply_event_gains:
 32 |             self._class_gains = db_config._class_gains
 33 |         
 34 |         
 35 |     def synthesize_mixtures(self):
 36 |         rirdata2room_idx = {1: 0, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 8: 6, 9: 7, 10: 8} # room numbers in the rirdata array
 37 |         # create path if doesn't exist
 38 |         if not os.path.isdir(self._outpath):
 39 |             os.makedirs(self._outpath)
 40 |         
 41 |         for nfold in range(self._nb_folds):
 42 |             print('Generating scene audio for fold {}'.format(nfold+1))
 43 | 
 44 |             rooms = self._mixtures[nfold][0]['roomidx']
 45 |             nb_rooms_in_fold = len(rooms)
 46 |             for nr in range(nb_rooms_in_fold):
 47 | 
 48 |                 nroom = rooms[nr]
 49 |                 nb_mixtures = len(self._mixtures[nfold][nr]['mixture'])
 50 |                 print('Loading RIRs for room {}'.format(nroom))
 51 |                 
 52 |                 room_idx = rirdata2room_idx[nroom]
 53 |                 if nroom > 9:
 54 |                     struct_name = 'rirs_{}_{}'.format(nroom,self._room_names[room_idx])
 55 |                 else:
 56 |                     struct_name = 'rirs_0{}_{}'.format(nroom,self._room_names[room_idx])
 57 |                 path = self._rirpath + '/' + struct_name + '.mat'
 58 |                 rirs = mat73.loadmat(path)
 59 |                 rirs = rirs['rirs'][self._audio_format]
 60 |                 # stack all the RIRs for all heights to make one large trajectory
 61 |                 print('Stacking same trajectory RIRs')
 62 |                 lRir = len(rirs[0][0])
 63 |                 nCh = len(rirs[0][0][0])
 64 |                 
 65 |                 n_traj = np.shape(self._rirdata[room_idx][0][2])[0]
 66 |                 n_rirs_max = np.max(np.sum(self._rirdata[room_idx][0][3],axis=1))
 67 |                 
 68 |                 channel_rirs = np.zeros((lRir, nCh, n_rirs_max, n_traj))
 69 |                 for ntraj in range(n_traj):
 70 |                     nHeights = np.sum(self._rirdata[room_idx][0][3][ntraj,:]>0)
 71 |                     
 72 |                     nRirs_accum = 0
 73 |                     
 74 |                     # flip the direction of each second height, so that a
 75 |                     # movement can jump from the lower to the higher smoothly and
 76 |                     # continue moving the opposite direction
 77 |                     flip = False
 78 |                     for nheight in range(nHeights):
 79 |                         nRirs_nh = self._rirdata[room_idx][0][3][ntraj,nheight]
 80 |                         rir_l = len(rirs[ntraj][nheight][0,0,:])
 81 |                         if flip:
 82 |                             channel_rirs[:, :, nRirs_accum + np.arange(0,nRirs_nh),ntraj] = rirs[ntraj][nheight][:,:,np.arange(rir_l-1,-1,-1)]
 83 |                         else:
 84 |                             channel_rirs[:, :, nRirs_accum + np.arange(0,nRirs_nh),ntraj] = rirs[ntraj][nheight]
 85 |                             
 86 |                         nRirs_accum += nRirs_nh
 87 |                         flip = not flip
 88 |                 
 89 |                 del rirs #clear some memory
 90 |                 
 91 |                 for nmix in range(nb_mixtures):
 92 |                     print('Writing mixture {}/{}'.format(nmix+1,nb_mixtures))
 93 | 
 94 |                     ### WRITE TARGETS EVENTS
 95 |                     mixture_nm = self._mixtures[nfold][nr]['mixture'][nmix]
 96 |                     try:
 97 |                         nb_events = len(mixture_nm['class'])
 98 |                     except TypeError:
 99 |                         nb_events = 1
100 |                     
101 |                     mixsig = np.zeros((self._l_mix, 4))
102 |                     for nev in range(nb_events):
103 |                         if not nb_events == 1:
104 |                             classidx = int(mixture_nm['class'][nev])
105 |                             onoffset = mixture_nm['event_onoffsets'][nev,:]
106 |                             filename = mixture_nm['files'][nev]
107 |                             ntraj = int(mixture_nm['trajectory'][nev])
108 |                         
109 |                         else:
110 |                             classidx = int(mixture_nm['class'])
111 |                             onoffset = mixture_nm['event_onoffsets']
112 |                             filename = mixture_nm['files']
113 |                             ntraj = int(mixture_nm['trajectory'])
114 |                             
115 |                         # load event audio and resample to match RIR sampling
116 |                         eventsig, fs_db = soundfile.read(self._db_path + '/' + filename)
117 |                         if len(np.shape(eventsig)) > 1:
118 |                             eventsig = eventsig[:,0]
119 |                         eventsig = signal.resample_poly(eventsig, self._fs_mix, fs_db)
120 |                         
121 |                         #spatialize audio
122 |                         riridx = mixture_nm['rirs'][nev] if nb_events > 1 else mixture_nm['rirs']
123 |                         
124 |                         
125 |                         moving_condition = mixture_nm['isMoving'][nev] if nb_events > 1 else mixture_nm['isMoving']
126 |                         if nb_events > 1 and not moving_condition:
127 |                             riridx = int(riridx[0]) if len(riridx)==1 else riridx.astype('int')
128 |                         if nb_events == 1 and type(riridx) != int:
129 |                             riridx = riridx[0]
130 |                             
131 |                         if moving_condition:
132 |                             nRirs_moving = len(riridx) if np.shape(riridx) else 1
133 |                             ir_times = self._time_idx100[np.arange(0,nRirs_moving)]
134 |                             mixeventsig = 481.6989*utils.ctf_ltv_direct(eventsig, channel_rirs[:, :, riridx, ntraj], ir_times, self._fs_mix, self._stft_winsize_moving) / float(len(eventsig))
135 |                         else:
136 |                             mixeventsig0 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 0, riridx, ntraj]), mode='full', method='fft')
137 |                             mixeventsig1 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 1, riridx, ntraj]), mode='full', method='fft')
138 |                             mixeventsig2 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 2, riridx, ntraj]), mode='full', method='fft')
139 |                             mixeventsig3 = scipy.signal.convolve(eventsig, np.squeeze(channel_rirs[:, 3, riridx, ntraj]), mode='full', method='fft')
140 | 
141 |                             mixeventsig = np.stack((mixeventsig0,mixeventsig1,mixeventsig2,mixeventsig3),axis=1)
142 |                         if self._apply_event_gains:
143 |                             # apply random gain to each event based on class gain, distribution given externally
144 |                             K=1000
145 |                             rand_energies_per_spec = utils.sample_from_quartiles(K, self._class_gains[classidx])
146 |                             intr_quart_energies_per_sec = rand_energies_per_spec[K + np.arange(3*(K+1))]
147 |                             rand_energy_per_spec = intr_quart_energies_per_sec[np.random.randint(len(intr_quart_energies_per_sec))]
148 |                             sample_onoffsets = mixture_nm['sample_onoffsets'][nev]
149 |                             sample_active_time = sample_onoffsets[1] - sample_onoffsets[0]
150 |                             target_energy = rand_energy_per_spec*sample_active_time
151 |                             if self._audio_format == 'mic':
152 |                                 event_omni_energy = np.sum(np.sum(mixeventsig,axis=1)**2)
153 |                             elif self._audio_format == 'foa':
154 |                                 event_omni_energy = np.sum(mixeventsig[:,0]**2)
155 |                                 
156 |                             norm_gain = np.sqrt(target_energy / event_omni_energy)
157 |                             mixeventsig = norm_gain * mixeventsig
158 | 
159 |                         lMixeventsig = np.shape(mixeventsig)[0]
160 |                         if np.round(onoffset[0]*self._fs_mix) + lMixeventsig <= self._t_mix * self._fs_mix:
161 |                             mixsig[int(np.round(onoffset[0]*self._fs_mix)) + np.arange(0,lMixeventsig,dtype=int), :] += mixeventsig
162 |                         else:
163 |                             lMixeventsig_trunc = int(self._t_mix * self._fs_mix - int(np.round(onoffset[0]*self._fs_mix)))
164 |                             mixsig[int(np.round(onoffset[0]*self._fs_mix)) + np.arange(0,lMixeventsig_trunc,dtype=int), :] += mixeventsig[np.arange(0,lMixeventsig_trunc,dtype=int), :]
165 | 
166 |                     # normalize
167 |                     gnorm = 0.5/np.max(np.max(np.abs(mixsig)))
168 | 
169 |                     mixsig = gnorm*mixsig
170 |                     mixture_filename = 'fold{}_room{}_mix{:03}.wav'.format(nfold+1, nr+1, nmix+1)
171 |                     soundfile.write(self._outpath + '/' + mixture_filename, mixsig, self._fs_mix)
172 | 
173 | 
174 |                 
175 | 
176 | 
177 | 
178 | 
179 | 


--------------------------------------------------------------------------------
/db_config.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.io
  3 | import csv
  4 | import librosa
  5 | import os
  6 | 
  7 | class DBConfig(object):
  8 |     def __init__(
  9 |             self, params
 10 |             ):
 11 |         self._rirpath = params['rirpath']
 12 |         self._mixturepath = params['mixturepath']
 13 |         self._rirdata = self._load_rirdata()
 14 |         self._nb_folds = params['nb_folds']
 15 |         self._rooms2fold = params['rooms2fold']
 16 |         self._db_path = params['db_path']
 17 |         self._db_name = params['db_name']
 18 |         if self._db_name == 'fsd50k':
 19 |             self._fs = 44100
 20 |             self._classes = ['femaleSpeech', 'maleSpeech', 'clapping', 'telephone', 'laughter', 'domesticSounds', 'footsteps',
 21 |                              'doorCupboard', 'music', 'musicInstrument', 'waterTap', 'bell', 'knock']
 22 |             self._nb_classes = len(self._classes)
 23 |             self._class_mobility = [2, 2, 2, 2, 2, 2, 1, 0, 0, 0, 0, 0, 0]
 24 |             self._apply_class_gains = True
 25 | #             self._class_gains = [[0,    0.2004,    0.8008,    6.8766,  357.8846], # femaleSpeech
 26 | #                                  [0.0060,    0.4901,    2.5097,   14.3011,  372.2183], # maleSpeech
 27 | #                                  [0.3607,    1.1029,    2.6719,    3.9629,   26.6442],  #clapping
 28 | #                                  [0.0072,    0.8222,    2.3849,   34.1233,  168.5152],  #telephone
 29 | #                                  [0.0273,    0.8911,    1.9856,    5.6164,   79.1070],  #laughter
 30 | #                                  [0.0268,    0.1009,    1.8363,   13.9294,   83.2484],  #domesticSounds
 31 | #                                  [0.0099,    0.3764,    1.2759,    5.4426,  318.8329],  #footsteps
 32 | #                                  [0.0697,    0.4919,    2.7159,   28.0537,  313.8807],  #doorCupboard
 33 | #                                  [0.0219,    0.3189,    0.7787,    2.3823,  355.9656],  #music
 34 | #                                  [0.0160,    0.9563,    2.3413,    5.6720,  168.6679],  #musicInstrument
 35 | #                                  [0.0972,    0.1828,    0.6304,    0.9522,  125.1975],  #waterTap 
 36 | #                                  [0.0160,    0.9563,    2.3413,    5.6720,  168.6679],  #bell 
 37 | #                                  [0.0697,    0.4919,    2.7159,   28.0537,  313.8807]] #knock            
 38 |             self._class_gains = [[0.0791,    0.5330,    1.3132,    2.2365,  541.3376], # femaleSpeech
 39 |                                  [0.0116,    0.6913,    1.2199,    3.0048,  235.0189], # maleSpeech
 40 |                                  [0.5083,    2.2579,    3.0934,    7.6387,  100.1174],  #clapping
 41 |                                  [0.0126,    0.3373,    0.7526,    2.1165,   18.5226],  #telephone
 42 |                                  [0.1909,    1.4950,    3.2206,    8.2153,  221.2892],  #laughter
 43 |                                  [0.0004,    1.8347,    3.4778,    5.9276,  555.4895],  #domesticSounds
 44 |                                  [0.0099,    0.3969,    0.8870,    2.0800,   15.7529],  #footsteps
 45 |                                  [0.0146,    0.9141,    7.8186,  109.0767,  3979.700],  #doorCupboard
 46 |                                  [0.1153,    0.4313,    1.2903,    3.3541,   52.6977],  #music
 47 |                                  [0.0596,    1.4146,    5.3529,   20.6286,  362.0704],  #musicInstrument
 48 |                                  [0.0117,    0.5505,    1.4926,    2.1936,   44.9466],  #waterTap 
 49 |                                  [0.0596,    1.4146,    5.3529,   20.6286,  362.0704],  #bell 
 50 |                                  [2.4502,    2.4502,   41.3609,   80.2716,   80.2716]]  #knock   
 51 |             self._samplelist = self._load_db_fileinfo_fsd()
 52 |         
 53 |         elif self._db_name == 'nigens':
 54 |             self._fs = 44100
 55 |             self._class_dict = {'alarm': 0,'baby': 1, 'crash': 2, 'dog': 3, 'engine': 4, 'femaleScream': 5, 'femaleSpeech': 6,
 56 |                              'fire': 7, 'footsteps': 8, 'knock': 9, 'maleScream': 10, 'maleSpeech': 11, 
 57 |                              'phone': 12, 'piano': 13, 'general': 14}
 58 |             self._class_mobility = [0, 2, 0, 2, 2, 0, 2, 0, 1, 0, 0, 2, 2, 0, 0]
 59 |             self._classes = list(self._class_dict.keys())
 60 |             self._nb_classes = len(self._classes)
 61 |             self._samplelist = self._load_db_fileinfo_nigens()
 62 |             self._apply_class_gains = False
 63 |             self._class_gaines = []
 64 |            
 65 |     
 66 |     def _load_rirdata(self):
 67 |         matdata = scipy.io.loadmat(self._rirpath + '/rirdata.mat')
 68 |         rirdata = matdata['rirdata']['room'][0][0]
 69 |         return rirdata
 70 |     
 71 |     def _load_db_fileinfo_fsd(self):
 72 |         samplelist_per_fold = []
 73 |         folds = self._make_selected_filelist()
 74 |         
 75 |         for nfold in range(self._nb_folds):
 76 |             print('Preparing sample list for fold {}'.format(str(nfold+1)))
 77 |             counter = 1
 78 |             samplelist = {'class': np.array([]), 'audiofile': np.array([]), 'duration': np.array([]), 'onoffset': [], 'nSamples': [],
 79 |                          'nSamplesPerClass': np.array([]), 'meanStdDurationPerClass': np.array([]), 'minMaxDurationPerClass': np.array([])}
 80 |             for ncl in range(self._nb_classes):
 81 |                 nb_samples_per_class = len(folds[ncl][nfold])
 82 |                 
 83 |                 for ns in range(nb_samples_per_class):
 84 |                     samplelist['class'] = np.append(samplelist['class'], ncl)
 85 |                     samplelist['audiofile'] = np.append(samplelist['audiofile'], folds[ncl][nfold][ns])
 86 |                     audiopath = self._db_path + '/' + folds[ncl][nfold][ns]
 87 |                     audio, sr = librosa.load(audiopath)
 88 |                     duration = len(audio)/float(sr)
 89 |                     samplelist['duration'] = np.append(samplelist['duration'], duration)
 90 |                     samplelist['onoffset'].append(np.array([[0., duration],]))
 91 |                     samplelist['nSamples'].append(counter)
 92 |                     counter += 1
 93 |             samplelist['onoffset'] = np.squeeze(np.array(samplelist['onoffset'],dtype=object))
 94 |             for n_class in range(self._nb_classes):
 95 |                 class_idx = (samplelist['class'] == n_class)
 96 |                 samplelist['nSamplesPerClass'] = np.append(samplelist['nSamplesPerClass'], np.sum(class_idx))
 97 |                 if n_class == 0:
 98 |                     samplelist['meanStdDurationPerClass'] = np.array([[np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]])
 99 |                     samplelist['minMaxDurationPerClass'] =  np.array([[np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]])
100 |                 else:
101 |                     samplelist['meanStdDurationPerClass'] = np.vstack((samplelist['meanStdDurationPerClass'], np.array([np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])])))
102 |                     samplelist['minMaxDurationPerClass'] = np.vstack((samplelist['minMaxDurationPerClass'], np.array([np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])])))
103 |             samplelist_per_fold.append(samplelist)
104 |                 
105 |         
106 |         return samplelist_per_fold
107 |     
108 | 
109 |     def _load_db_fileinfo_nigens(self):
110 |         samplelist_per_fold = []
111 |         
112 |         for nfold in range(self._nb_folds):
113 |             print('Preparing sample list for fold {}'.format(str(nfold+1)))
114 |             foldlist_file = self._db_path + '/NIGENS_8-foldSplit_fold' + str(nfold+1) + '_wo_timit.flist'
115 |             filelist = []
116 |             with open(foldlist_file, newline = '') as flist:
117 |                 flist_reader = csv.reader(flist, delimiter='\t')
118 |                 for fline in flist_reader:
119 |                     filelist.append(fline)
120 |             flist_len = len(filelist)
121 |             
122 |             samplelist = {'class': np.array([]), 'audiofile': np.array([]), 'duration': np.array([]), 'onoffset': [], 'nSamples': flist_len,
123 |                           'nSamplesPerClass': np.array([]), 'meanStdDurationPerClass': np.array([]), 'minMaxDurationPerClass': np.array([])}
124 |             for file in range(flist_len):
125 |                 clsfilename = filelist[file][0].split('/')
126 |                 clsname = clsfilename[0]
127 |                 filename = clsfilename[1]
128 |                 
129 |                 samplelist['class'] = np.append(samplelist['class'], int(self._class_dict[clsname]))
130 |                 samplelist['audiofile'] = np.append(samplelist['audiofile'], clsname + '/' + filename)
131 |                 audiopath = self._db_path + '/' + clsname + '/' + filename
132 |                 #print(audiopath)
133 |                 #with contextlib.closing(wave.open(audiopath,'r')) as f:
134 |                 audio, sr = librosa.load(audiopath)
135 |                 samplelist['duration'] = np.append(samplelist['duration'], len(audio)/float(sr))
136 |                 
137 |                 if clsname == 'general':
138 |                     onoffsets = []
139 |                     onoffsets.append([0., samplelist['duration'][file]])
140 |                     samplelist['onoffset'].append(np.array(onoffsets))
141 |                 else:
142 |                     meta_file = self._db_path + '/' + clsname + '/' + filename + '.txt'
143 |                     onoffsets = []
144 |                     with open(meta_file, newline = '') as meta:
145 |                         meta_reader = csv.reader(meta, delimiter='\t')
146 |                         for onoff in meta_reader:
147 |                             onoffsets.append([float(onoff[0]), float(onoff[1])])
148 |                             
149 |                     samplelist['onoffset'].append(np.array(onoffsets))
150 |             samplelist['onoffset'] = np.squeeze(np.array(samplelist['onoffset'],dtype=object))
151 |             
152 |             for n_class in range(self._nb_classes):
153 |                 class_idx = (samplelist['class'] == n_class)
154 |                 samplelist['nSamplesPerClass'] = np.append(samplelist['nSamplesPerClass'], np.sum(class_idx))
155 |                 if n_class == 0:
156 |                     samplelist['meanStdDurationPerClass'] = np.array([[np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])]])
157 |                     samplelist['minMaxDurationPerClass'] =  np.array([[np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])]])
158 |                 else:
159 |                     samplelist['meanStdDurationPerClass'] = np.vstack((samplelist['meanStdDurationPerClass'], np.array([np.mean(samplelist['duration'][class_idx]), np.std(samplelist['duration'][class_idx])])))
160 |                     samplelist['minMaxDurationPerClass'] = np.vstack((samplelist['minMaxDurationPerClass'], np.array([np.min(samplelist['duration'][class_idx]), np.max(samplelist['duration'][class_idx])])))
161 |             samplelist_per_fold.append(samplelist)
162 |         
163 |         return samplelist_per_fold
164 |     
165 |     def _make_selected_filelist(self):
166 |         folds = []
167 |         folds_names = ['train', 'test'] #TODO: make it more generic
168 |         nb_folds = len(folds_names)
169 |         class_list = self._classes #list(self._classes.keys())
170 |         
171 |         for ntc in range(self._nb_classes):
172 |             classpath = self._db_path + '/' + class_list[ntc]
173 |             
174 |             per_fold = []
175 |             for nf in range(nb_folds):
176 |                 foldpath = classpath + '/' + folds_names[nf]
177 |                 foldcont = os.listdir(foldpath)
178 |                 nb_subdirs = len(foldcont)
179 |                 filelist = []
180 |                 for ns in range(nb_subdirs):
181 |                     subfoldcont = os.listdir(foldpath + '/' + foldcont[ns])
182 |                     for nfl in range(len(subfoldcont)):
183 |                         if subfoldcont[nfl][0] != '.' and subfoldcont[nfl].endswith('.wav'):
184 |                             filelist.append(class_list[ntc] + '/' + folds_names[nf] + '/' + foldcont[ns] + '/' + subfoldcont[nfl])
185 |                 per_fold.append(filelist)
186 |             folds.append(per_fold)
187 |         
188 |         return folds
189 | 


--------------------------------------------------------------------------------
/db_config_fsd.obj:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/danielkrause/DCASE2022-data-generator/01745f736be8c537076c42fa1642ecb5c3454714/db_config_fsd.obj


--------------------------------------------------------------------------------
/db_config_nigens.obj:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/danielkrause/DCASE2022-data-generator/01745f736be8c537076c42fa1642ecb5c3454714/db_config_nigens.obj


--------------------------------------------------------------------------------
/example_script_DCASE2021.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import numpy as np
 3 | from db_config import DBConfig
 4 | from metadata_synthesizer import MetadataSynthesizer
 5 | from audio_synthesizer import AudioSynthesizer
 6 | from audio_mixer import AudioMixer
 7 | import pickle
 8 | from generation_parameters import get_params
 9 | 
10 | ##############
11 | ############## THIS IS AN EXEMPLARY SCRIPT GENERATING DATA
12 | ############## SIMILAR TO THE DCASE2021 dataset
13 | ##############
14 | 
15 | 
16 | # use parameter set defined by user
17 | task_id = '1'  ### '1' - NIGENS, '2' - FSD50k
18 | 
19 | params = get_params(task_id)
20 |     
21 |     ### Create database config based on params (e.g. filelist name etc.)
22 | #db_config = DBConfig(params)
23 |     
24 |     # LOAD DB-config which is already done
25 | db_handler = open('db_config_nigens.obj','rb')
26 | db_config = pickle.load(db_handler)
27 | db_handler.close()
28 |     
29 | #create mixture synthesizer class
30 | noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless')
31 | 
32 | #create mixture targets
33 | mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures()
34 |     
35 | #calculate statistics and create metadata structure
36 | metadata, stats = noiselessSynth.prepare_metadata_and_stats()
37 |     
38 | #write metadata to text files
39 | noiselessSynth.write_metadata()
40 | 
41 | #create directional interference mixtures
42 | task_id_int = '3'
43 | params_interference = get_params(task_id_int)
44 | noiselessSynth_interference = MetadataSynthesizer(db_config, params_interference, 'target_interf')
45 | interference_target, interference_setup_target, foldlist_target_int = noiselessSynth.create_mixtures()
46 |  
47 | if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC)
48 |     #create audio synthesis class and synthesize audio files for given mixtures
49 |     noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format'])
50 |     noiselessAudioSynth.synthesize_mixtures()
51 |     
52 |     
53 |     #synthesize audio containing interference mixtures
54 |     noiselessAudioSynth_interference = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, params['audio_format'])
55 |     noiselessAudioSynth_interference.synthesize_mixtures()
56 |         
57 |     #mix the created audio mixtures with background noise and interference mixtures
58 |     audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_interf_noisy')
59 |     audioMixer.mix_audio()
60 | else:
61 |     #create audio synthesis class and synthesize audio files for given mixtures
62 |     noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa')
63 |     noiselessAudioSynth.synthesize_mixtures()
64 |     noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic')
65 |     noiselessAudioSynth2.synthesize_mixtures()
66 |     
67 |     #synthesize audio containing interference mixtures
68 |     noiselessAudioSynth_interference = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, 'foa')
69 |     noiselessAudioSynth_interference.synthesize_mixtures()
70 |     noiselessAudioSynth_interference2 = AudioSynthesizer(params_interference, interference_target, interference_setup_target, db_config, 'mic')
71 |     noiselessAudioSynth_interference2.synthesize_mixtures()      
72 |     
73 |     #mix the created audio mixtures with background noise and interference mixtures
74 |     audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_interf_noisy')
75 |     audioMixer.mix_audio()
76 |     audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_interf_noisy')
77 |     audioMixer2.mix_audio()


--------------------------------------------------------------------------------
/example_script_DCASE2022.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import numpy as np
 3 | from db_config import DBConfig
 4 | from metadata_synthesizer import MetadataSynthesizer
 5 | from audio_synthesizer import AudioSynthesizer
 6 | from audio_mixer import AudioMixer
 7 | import pickle
 8 | from generation_parameters import get_params
 9 | 
10 | 
11 | ##############
12 | ############## THIS IS AN EXEMPLARY SCRIPT GENERATING DATA
13 | ############## SIMILAR TO THE DCASE2022 dataset
14 | ##############
15 | 
16 | 
17 | # use parameter set defined by user
18 | task_id = '2' 
19 | 
20 | params = get_params(task_id)
21 |     
22 |     ### Create database config based on params (e.g. filelist name etc.)
23 | #db_config = DBConfig(params)
24 |     
25 |     # LOAD DB-config which is already done
26 | db_handler = open('db_config_fsd.obj','rb')
27 | db_config = pickle.load(db_handler)
28 | db_handler.close()
29 |     
30 | #create mixture synthesizer class
31 | noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless')
32 |     
33 | #create mixture targets
34 | mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures()
35 |     
36 | #calculate statistics and create metadata structure
37 | metadata, stats = noiselessSynth.prepare_metadata_and_stats()
38 |     
39 | #write metadata to text files
40 | noiselessSynth.write_metadata()
41 |     
42 | 
43 | if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC)
44 |     #create audio synthesis class and synthesize audio files for given mixtures
45 |     noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format'])
46 |     noiselessAudioSynth.synthesize_mixtures()
47 |         
48 |     #mix the created audio mixtures with background noise
49 |     audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_noisy')
50 |     audioMixer.mix_audio()
51 | else:
52 |     #create audio synthesis class and synthesize audio files for given mixtures
53 |     noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa')
54 |     noiselessAudioSynth.synthesize_mixtures()
55 |     noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic')
56 |     noiselessAudioSynth2.synthesize_mixtures()
57 |         
58 |     #mix the created audio mixtures with background noise
59 |     audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_noisy')
60 |     audioMixer.mix_audio()
61 |     audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_noisy')
62 |     audioMixer2.mix_audio()


--------------------------------------------------------------------------------
/generation_parameters.py:
--------------------------------------------------------------------------------
 1 | # Parameters used in the data generation process.
 2 | 
 3 | 
 4 | def get_params(argv='1'):
 5 |     print("SET: {}".format(argv))
 6 |     # ########### default parameters (NIGENS data) ##############
 7 |     params = dict(
 8 |         db_name = 'nigens',  # name of the audio dataset used for data generation
 9 |         rirpath = '/scratch/asignal/krauseda/DCASE_data_generator/RIR_DB',   # path containing Room Impulse Responses (RIRs)
10 |         mixturepath = 'E:/DCASE2022/TAU_Spatial_RIR_Database_2021/Dataset-NIGENS',  # output path for the generated dataset
11 |         noisepath = '/scratch/asignal/krauseda/DCASE_data_generator/Noise_DB',  # path containing background noise recordings
12 |         nb_folds = 2,  # number of folds (default 2 - training and testing)
13 |         rooms2fold = [[10, 6, 1, 4, 3, 8], # FOLD 1, rooms assigned to each fold (0's are ignored)
14 |                       [9, 5, 2, 0, 0, 0]], # FOLD 2
15 |         db_path = 'E:/DCASE2022/TAU_Spatial_RIR_Database_2021/Code/NIGENS',  # path containing audio events to be utilized during data generation
16 |         max_polyphony = 3,  # maximum number of overlapping sound events
17 |         active_classes = [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13],  # list of sound classes to be used for data generation
18 |         nb_mixtures_per_fold = [900, 300], # if scalar, same number of mixtures for each fold
19 |         mixture_duration = 60., #in seconds
20 |         event_time_per_layer = 40., #in seconds (should be less than mixture_duration)
21 |         audio_format = 'both', # 'foa' (First Order Ambisonics) or 'mic' (four microphones) or 'both'
22 |             )
23 |         
24 | 
25 |     # ########### User defined parameters ##############
26 |     if argv == '1':
27 |         print("USING DEFAULT PARAMETERS FOR NIGENS DATA\n")
28 | 
29 |     elif argv == '2': ###### FSD50k DATA
30 |         params['db_name'] = 'fsd50k'
31 |         params['db_path']= '/scratch/asignal/krauseda/DCASE_data_generator/Code/FSD50k'
32 |         params['mixturepath'] = '/scratch/asignal/krauseda/Data-FSD'
33 |         params['active_classes'] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
34 |         params['max_polyphony'] = 2
35 | 
36 |     elif argv == '3': ###### NIGENS interference data
37 |         params['active_classes'] = [4, 7, 14] 
38 |         params['max_polyphony'] = 1
39 |         
40 |     else:
41 |         print('ERROR: unknown argument {}'.format(argv))
42 |         exit()
43 | 
44 |     for key, value in params.items():
45 |         print("\t{}: {}".format(key, value))
46 |     return params


--------------------------------------------------------------------------------
/make_dataset.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import numpy as np
 3 | from db_config import DBConfig
 4 | from metadata_synthesizer import MetadataSynthesizer
 5 | from audio_synthesizer import AudioSynthesizer
 6 | from audio_mixer import AudioMixer
 7 | import pickle
 8 | from generation_parameters import get_params
 9 | 
10 | 
11 | 
12 | 
13 | def main(argv):
14 |     """
15 |     Main wrapper for the whole data generation framework.
16 |     
17 |     :param argv: expects two optional inputs. 
18 |         first input: task_id - (optional) To chose the generation parameters from generation_parameters.py.
19 |                                 (default) 1 - uses default parameters (FSD50K data)
20 |     """
21 |     print(argv)
22 |     if len(argv) != 2:
23 |         print('\n\n')
24 |         print('The code expected an optional input')
25 |         print('\t>> python make_dataset.py <task-id>')
26 |         print('\t\t<task-id> is used to choose the user-defined parameter set from generation_parameters.py')
27 |         print('Using default inputs for now')
28 |         print('\n\n')
29 | 
30 |     # use parameter set defined by user
31 |     task_id = '2' if len(argv) < 2 else argv[1]
32 | 
33 |     params = get_params(task_id)
34 |     
35 |     ### Create database config based on params (e.g. filelist name etc.)
36 |     db_config = DBConfig(params)
37 |     
38 |     # LOAD DB-config which is already done
39 |     # db_handler = open('db_config_fsd.obj','rb')
40 |     # db_config = pickle.load(db_handler)
41 |     # db_handler.close()
42 |     
43 |     #create mixture synthesizer class
44 |     noiselessSynth = MetadataSynthesizer(db_config, params, 'target_noiseless')
45 |     
46 |     #create mixture targets
47 |     mixtures_target, mixture_setup_target, foldlist_target = noiselessSynth.create_mixtures()
48 |     
49 |     #calculate statistics and create metadata structure
50 |     metadata, stats = noiselessSynth.prepare_metadata_and_stats()
51 |     
52 |     #write metadata to text files
53 |     noiselessSynth.write_metadata()
54 |     
55 |     #create audio synthesis class and synthesize audio files for given mixtures
56 |     if not params['audio_format'] == 'both': # create a dataset of only one data format (FOA or MIC)
57 |         noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, params['audio_format'])
58 |         noiselessAudioSynth.synthesize_mixtures()
59 |         
60 |         #mix the created audio mixtures with background noise
61 |         audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, params['audio_format'], 'target_noisy')
62 |         audioMixer.mix_audio()
63 |     else:
64 |         noiselessAudioSynth = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'foa')
65 |         noiselessAudioSynth.synthesize_mixtures()
66 |         noiselessAudioSynth2 = AudioSynthesizer(params, mixtures_target, mixture_setup_target, db_config, 'mic')
67 |         noiselessAudioSynth2.synthesize_mixtures()
68 |         
69 |         #mix the created audio mixtures with background noise
70 |         audioMixer = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'foa', 'target_noisy')
71 |         audioMixer.mix_audio()
72 |         audioMixer2 = AudioMixer(params, db_config, mixtures_target, mixture_setup_target, 'mic', 'target_noisy')
73 |         audioMixer2.mix_audio()
74 |     
75 | 
76 |     
77 | if __name__ == "__main__":
78 |     try:
79 |         sys.exit(main(sys.argv))
80 |     except (ValueError, IOError) as e:
81 |         sys.exit(e)
82 | 
83 | 


--------------------------------------------------------------------------------
/metadata_synthesizer.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from utils import cart2sph
  3 | import os
  4 | import csv
  5 | 
  6 | class MetadataSynthesizer(object):
  7 |     def __init__(
  8 |             self, db_config, params, scenario_name
  9 |             ):
 10 |         self._db_config = db_config
 11 |         self._db = db_config._db_name
 12 |         self._metadata_path = params['mixturepath'] + '/' + 'metadata'
 13 |         self._classnames = db_config._classes
 14 |         self._active_classes = np.sort(params['active_classes'])
 15 |         self._nb_active_classes = len(self._active_classes)
 16 |         self._class2activeClassmap = []
 17 |         for cl in range(len(self._db_config._classes)):
 18 |             if cl in self._active_classes:
 19 |                 self._class2activeClassmap.append(cl)
 20 |             else:
 21 |                 self._class2activeClassmap.append(0)
 22 |         
 23 |         self._class_mobility = db_config._class_mobility
 24 |         self._mixture_setup = {}
 25 |         self._mixture_setup['scenario'] = scenario_name
 26 |         self._mixture_setup['nb_folds'] = db_config._nb_folds
 27 |         self._mixture_setup['rooms2folds'] = db_config._rooms2fold
 28 |         self._mixture_setup['classnames'] = []
 29 |         for cl in self._classnames:
 30 |             self._mixture_setup['classnames'].append(cl)
 31 |         self._mixture_setup['nb_classes'] = len(self._active_classes)
 32 |         self._mixture_setup['fs_mix'] = 24000 #fs of RIRs
 33 |         self._mixture_setup['mixture_duration'] = params['mixture_duration']
 34 |         self._nb_mixtures_per_fold = params['nb_mixtures_per_fold']
 35 |         self._nb_mixtures = self._mixture_setup['nb_folds'] * self._nb_mixtures_per_fold if np.isscalar(self._nb_mixtures_per_fold) else np.sum(self._nb_mixtures_per_fold)
 36 |         self._mixture_setup['total_duration'] = self._nb_mixtures * self._mixture_setup['mixture_duration']
 37 |         self._mixture_setup['speed_set'] =  [10., 20., 40.]
 38 |         self._mixture_setup['snr_set'] = np.arange(6.,31.)
 39 |         self._mixture_setup['time_idx_100ms'] = np.arange(0.,self._mixture_setup['mixture_duration'],0.1)
 40 |         self._mixture_setup['nOverlap'] = params['max_polyphony']
 41 |         self._nb_frames = len(self._mixture_setup['time_idx_100ms'])
 42 |         self._rnd_generator = np.random.default_rng()
 43 |         
 44 |         self._rirdata = db_config._rirdata
 45 |         self._nb_classes = len(self._classnames)
 46 |         self._nb_speeds = len(self._mixture_setup['speed_set'])
 47 |         self._nb_snrs = len(self._mixture_setup['snr_set'])
 48 |         self._total_event_time_per_layer = params['event_time_per_layer']
 49 |         self._total_silence_time_per_layer = self._mixture_setup['mixture_duration'] - self._total_event_time_per_layer
 50 |         self._min_gap_len = 1. # in seconds, minimum length of gaps between samples
 51 |         self._trim_threshold = 3. #in seconds, minimum length under which a trimmed event at end is discarded
 52 |         self._move_threshold = 3. #in seconds, minimum length over which events can be moving
 53 | 
 54 |     def create_mixtures(self):
 55 |         self._mixtures = []
 56 |         foldlist = []
 57 |         rirdata2room_idx = {1: 0, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 8: 6, 9: 7, 10: 8} # room numbers in the rirdata array
 58 | 
 59 |         for nfold in range(self._mixture_setup['nb_folds']):
 60 |             print('Generating metadata for fold {}'.format(str(nfold+1)))
 61 |             
 62 |             foldlist_nff = {}
 63 |             rooms_nf = np.array(self._mixture_setup['rooms2folds'][nfold])
 64 |             rooms_nf = rooms_nf[rooms_nf>0]
 65 |             nb_rooms_nf = len(rooms_nf)
 66 |             
 67 |             
 68 |             idx_active = np.array([])
 69 |             for na in range(self._nb_active_classes):
 70 |                 idx_active = np.append(idx_active, np.nonzero(self._db_config._samplelist[nfold]['class'] == self._active_classes[na]))
 71 |             idx_active = idx_active.astype('int')
 72 | 
 73 |             foldlist_nff['class'] = self._db_config._samplelist[nfold]['class'][idx_active]
 74 |             foldlist_nff['audiofile'] = self._db_config._samplelist[nfold]['audiofile'][idx_active]
 75 |             foldlist_nff['duration'] = self._db_config._samplelist[nfold]['duration'][idx_active]
 76 |             foldlist_nff['onoffset'] = self._db_config._samplelist[nfold]['onoffset'][idx_active]
 77 |             nb_samples_nf = len(foldlist_nff['duration'])
 78 |             
 79 |             # shuffle randomly the samples in the target list to avoid samples of the same class coming consecutively
 80 | 
 81 |             if len(np.shape(foldlist_nff['onoffset'])) == 1:
 82 |                 foldlist_nff['onoffset'] = np.expand_dims(foldlist_nff['onoffset'],axis=1)
 83 |             foldlist_nf = foldlist_nff
 84 |             foldlist.append(foldlist_nf)
 85 |             sampleperm = self._rnd_generator.permutation(nb_samples_nf)
 86 |             foldlist_nf['class'] = foldlist_nf['class'][sampleperm]
 87 |             foldlist_nf['audiofile'] = foldlist_nf['audiofile'][sampleperm]
 88 |             foldlist_nf['duration'] = foldlist_nf['duration'][sampleperm]
 89 |             foldlist_nf['onoffset'] = foldlist_nf['onoffset'][sampleperm]
 90 |             room_mixtures = []
 91 |             for nr in range(nb_rooms_nf):
 92 |                 fold_mixture = {'mixture': []}
 93 |                 fold_mixture['roomidx'] = rooms_nf
 94 |                 nroom = rooms_nf[nr]
 95 |                 print('Room {} \n'.format(nroom+1))              
 96 |                 n_traj = np.shape(self._rirdata[rirdata2room_idx[nroom]][0][2])[0] #number of trajectories
 97 |                 traj_doas = []
 98 |                 
 99 |                 for ntraj in range(n_traj):
100 |                     n_rirs = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,:])
101 |                     n_heights = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,:]>0)
102 |                     all_doas = np.zeros((n_rirs, 3))
103 |                     n_rirs_accum = 0
104 |                     flip = 0
105 |                     
106 |                     for nheight in range(n_heights):
107 |                         n_rirs_nh = self._rirdata[rirdata2room_idx[nroom]][0][3][ntraj,nheight]
108 |                         doa_xyz = self._rirdata[rirdata2room_idx[nroom]][0][2][ntraj,nheight][0]
109 |                         #   stack all doas of trajectory together
110 |                         #   flip the direction of each second height, so that a
111 |                         #   movement can jump from the lower to the higher smoothly and
112 |                         #   continue moving the opposite direction
113 |                         if flip:
114 |                             nb_doas = np.shape(doa_xyz)[0]
115 |                             all_doas[n_rirs_accum + np.arange(n_rirs_nh), :] = doa_xyz[np.flip(np.arange(nb_doas)), :]
116 |                         else:
117 |                             all_doas[n_rirs_accum + np.arange(n_rirs_nh), :] = doa_xyz
118 |                         
119 |                         n_rirs_accum += n_rirs_nh
120 |                         flip = not flip
121 |                         
122 |                     traj_doas.append(all_doas)
123 |             
124 |                 # start layering the mixtures for the specific room
125 |                 sample_counter = 0
126 |                 if np.isscalar(self._nb_mixtures_per_fold):
127 |                     nb_mixtures_per_fold_per_room = int(np.round(self._nb_mixtures_per_fold / float(nb_rooms_nf)))
128 |                 else:
129 |                     nb_mixtures_per_fold_per_room = int(np.round(self._nb_mixtures_per_fold[nfold] / float(nb_rooms_nf)))
130 |                 
131 |                 for nmix in range(nb_mixtures_per_fold_per_room):
132 |                     print('Room {}, mixture {}'.format(nroom+1, nmix+1))
133 | 
134 |                     event_counter = 0
135 |                     nth_mixture = {'files': np.array([]), 'class': np.array([]), 'event_onoffsets': np.array([]),
136 |                                    'sample_onoffsets': np.array([]), 'trajectory': np.array([]), 'isMoving': np.array([]), 'isFlippedMoving': np.array([]),
137 |                                    'speed': np.array([]), 'rirs': [], 'doa_azel': np.array([],dtype=object)}
138 |                     nth_mixture['room'] = nroom
139 |                     nth_mixture['snr'] = self._mixture_setup['snr_set'][self._rnd_generator.integers(0,self._nb_snrs)]
140 |                     
141 |                     for layer in range(self._mixture_setup['nOverlap']):
142 |                         print('Layer {}'.format(layer))                        
143 |                         #zero this flag (explained later)
144 |                         TRIMMED_SAMPLE_AT_END = 0
145 |                         
146 |                         #fetch event samples till they add up to the target event time per layer
147 |                         event_time_in_layer = 0
148 |                         event_idx_in_layer = []
149 |                         
150 |                         while event_time_in_layer < self._total_event_time_per_layer:
151 |                             #get event duration
152 |                             ev_duration = np.ceil(foldlist_nf['duration'][sample_counter]*10.)/10.
153 |                             event_time_in_layer += ev_duration
154 |                             event_idx_in_layer.append(sample_counter)
155 |                             
156 |                             event_counter += 1
157 |                             sample_counter += 1
158 |   
159 |                             if sample_counter == nb_samples_nf:
160 |                                 sample_counter = 0
161 | 
162 |                         # the last sample is going to be trimmed to fit the desired
163 |                         # time, or omit if it is less than X sec, and occurs later than that time
164 |                         trimmed_event_length = self._total_event_time_per_layer - (event_time_in_layer - ev_duration)
165 |                         #Temporary workaround - for some reason for interference classes the dict is packed with an additional dimension - check it
166 |                         if len(foldlist_nf['onoffset'][event_idx_in_layer[-1]]) == 1:
167 |                             ons = foldlist_nf['onoffset'][event_idx_in_layer[-1]][0][0,0] if self._db == 'nigens' else foldlist_nf['onoffset'][event_idx_in_layer[-1]][0][0]
168 |                         else:
169 |                             ons = foldlist_nf['onoffset'][event_idx_in_layer[-1]][0,0] if self._db == 'nigens' else foldlist_nf['onoffset'][event_idx_in_layer[-1]][0]
170 |                         if (trimmed_event_length > self._trim_threshold) and (trimmed_event_length > np.floor(ons*10.)/10.):
171 |                             TRIMMED_SAMPLE_AT_END = 1
172 |                         else:
173 |                             if len(event_idx_in_layer) == 1:
174 |                                 raise ValueError("STOP, we will get stuck here forever")
175 | 
176 |                             #remove from sample list
177 |                             event_idx_in_layer = event_idx_in_layer[:-1]
178 |                             # reduce sample count and events-in-recording by 1
179 |                             event_counter -= 1
180 |                             if sample_counter != 0:
181 |                                 sample_counter -= 1
182 |                             else:
183 |                                 # move sample counter to end of the list to re-use sample
184 |                                 sample_counter = nb_samples_nf-1
185 |                             
186 |                         nb_samples_in_layer = len(event_idx_in_layer)
187 |                         # split silences between events
188 |                         # randomize N split points uniformly for N events (in
189 |                         # steps of 100msec)
190 |                         mult_silence = np.round(self._total_silence_time_per_layer*10.)
191 |                         
192 |                         mult_min_gap_len = np.round(self._min_gap_len*10.)
193 |                         if nb_samples_in_layer > 1:
194 |                             
195 |                             silence_splits = np.sort(self._rnd_generator.integers(1, mult_silence,nb_samples_in_layer-1))
196 |                             #force gaps smaller then _min_gap_len to it
197 |                             gaps = np.diff(np.concatenate(([0],silence_splits,[mult_silence])))
198 |                             smallgaps_idx = np.argwhere(gaps[:(nb_samples_in_layer-1)] < mult_min_gap_len)
199 |                             while np.any(smallgaps_idx):
200 |                                 temp = np.concatenate(([0], silence_splits))
201 |                                 silence_splits[smallgaps_idx] = temp[smallgaps_idx] + mult_min_gap_len
202 |                                 gaps = np.diff(np.concatenate(([0],silence_splits,[mult_silence])))
203 |                                 smallgaps_idx = np.argwhere(gaps[:(nb_samples_in_layer-1)] < mult_min_gap_len)
204 |                             if np.any(gaps < mult_min_gap_len):
205 |                                 min_idx = np.argwhere(gaps < mult_min_gap_len)
206 |                                 gaps[min_idx] = mult_min_gap_len
207 |                             # if gaps[nb_samples_in_layer-1] < mult_min_gap_len:
208 |                             #     gaps[nb_samples_in_layer-1] = mult_min_gap_len
209 |                             
210 |                         else:
211 |                             gaps = np.array([mult_silence])
212 |                         
213 |                         while np.sum(gaps) > self._total_silence_time_per_layer*10.:
214 |                             silence_diff = np.sum(gaps) - self._total_silence_time_per_layer*10.
215 |                             picked_gaps = np.argwhere(gaps > np.mean(gaps))
216 |                             eq_subtract = silence_diff / len(picked_gaps)
217 |                             picked_gaps = np.argwhere((gaps - eq_subtract) > mult_min_gap_len)
218 |                             gaps[picked_gaps] -= eq_subtract
219 |                             
220 |                         # distribute events in timeline
221 |                         time_idx = 0
222 |                         for nl in range(nb_samples_in_layer):
223 |                             #print('Sample {} in layer {}'.format(nl, layer))
224 |                             # event offset (quantized to 100ms)
225 |                             gap_nl = gaps[nl]
226 |                             time_idx += gap_nl
227 |                             event_nl = event_idx_in_layer[nl]
228 |                             event_duration_nl = np.ceil(foldlist_nf['duration'][event_nl]*10.)
229 |                             event_class_nl = int(foldlist_nf['class'][event_nl])
230 |                             if len(foldlist_nf['onoffset'][event_nl]) == 1:
231 |                                 onoffsets = foldlist_nf['onoffset'][event_nl][0]
232 |                             else:
233 |                                 onoffsets = foldlist_nf['onoffset'][event_nl]
234 |                                 
235 |                             sample_onoffsets = np.zeros_like(onoffsets)
236 |                             if self._db == 'nigens':
237 |                                 sample_onoffsets[:, 0] = np.floor(onoffsets[:,0]*10.)/10.
238 |                                 sample_onoffsets[:, 1] = np.floor(onoffsets[:,1]*10.)/10.
239 |                                 #trim event duration if it's the trimmed sample
240 |                                 if (nl == nb_samples_in_layer-1) and TRIMMED_SAMPLE_AT_END:
241 |                                     event_duration_nl = len(self._mixture_setup['time_idx_100ms']) - time_idx - 1
242 |                                     # keep only onset/offsets in the trimmed region
243 |                                     find_last_offset_mtx = (event_duration_nl/10.) > sample_onoffsets
244 |                                     sample_onoffsets = sample_onoffsets[:np.sum(find_last_offset_mtx[:,0]),:]
245 |                                     if sample_onoffsets[-1, 1] > event_duration_nl/10.:
246 |                                         sample_onoffsets[-1, 1] = event_duration_nl/10.
247 |                             else:
248 |                                 sample_onoffsets = np.floor(onoffsets*10.)/10.
249 |                                 #trim event duration if it's the trimmed sample
250 |                                 if (nl == nb_samples_in_layer-1) and TRIMMED_SAMPLE_AT_END:
251 |                                     event_duration_nl = len(self._mixture_setup['time_idx_100ms']) - time_idx - 1
252 |                                     # keep only onset/offsets in the trimmed region
253 |                                     if sample_onoffsets[1] > event_duration_nl/10.:
254 |                                         sample_onoffsets[1] = event_duration_nl/10.
255 |                                     
256 |                             # trajectory
257 |                             ev_traj = self._rnd_generator.integers(0, n_traj)
258 |                             nRirs = np.sum(self._rirdata[rirdata2room_idx[nroom]][0][3][ev_traj,:])
259 |                             
260 |                             #if event is less than move_threshold long, make it static by default
261 |                             if event_duration_nl <= self._move_threshold*10:
262 |                                 is_moving = 0 
263 |                             else:
264 |                                 if self._class_mobility[event_class_nl] == 2:
265 |                                     # randomly moving or static
266 |                                     is_moving = self._rnd_generator.integers(0,2)
267 |                                 else:
268 |                                     # only static or moving depending on class
269 |                                     is_moving = self._class_mobility[event_class_nl]
270 |                                                     
271 |                             if is_moving:
272 |                                 ev_nspeed = self._rnd_generator.integers(0,self._nb_speeds)
273 |                                 ev_speed = self._mixture_setup['speed_set'][ev_nspeed]
274 |                                 # check if with the current speed there are enough
275 |                                 # RIRs in the trajectory to move through the full
276 |                                 # duration of the event, otherwise, lower speed
277 |                                 while len(np.arange(0,nRirs,ev_speed/10)) <= event_duration_nl:
278 |                                     ev_nspeed = ev_nspeed-1
279 |                                     if ev_nspeed == -1:
280 |                                         break
281 | 
282 |                                     ev_speed = self._mixture_setup['speed_set'][ev_nspeed]
283 |                                 
284 |                                 is_flipped_moving = self._rnd_generator.integers(0,2)
285 |                                 event_span_nl = event_duration_nl * ev_speed / 10.
286 |                                     
287 |                                 if is_flipped_moving:
288 |                                     # sample length is shorter than all the RIRs
289 |                                     # in the moving trajectory
290 |                                     if ev_nspeed+1:
291 |                                         end_idx = event_span_nl + self._rnd_generator.integers(0, nRirs-event_span_nl+1)
292 |                                         start_idx = end_idx - event_span_nl
293 |                                         riridx = start_idx + np.arange(0, event_span_nl, dtype=int)
294 |                                         riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)] #pick every nth RIR based on speed
295 |                                         riridx = np.flip(riridx)
296 |                                     else:
297 |                                         riridx = np.arange(event_span_nl,0,-1)-1
298 |                                         riridx = riridx - (event_span_nl-nRirs)
299 |                                         riridx = riridx[np.arange(0, len(riridx), ev_speed/10, dtype=int)]
300 |                                         riridx[riridx<0] = 0
301 |                                 else:
302 |                                     if ev_nspeed+1:
303 |                                         start_idx = self._rnd_generator.integers(0, nRirs-event_span_nl+1)
304 |                                         riridx = start_idx + np.arange(0,event_span_nl,dtype=int) - 1
305 |                                         riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)]
306 |                                     else:
307 |                                         riridx = np.arange(0,event_span_nl)
308 |                                         riridx = riridx[np.arange(0,len(riridx),ev_speed/10,dtype=int)]
309 |                                         riridx[riridx>nRirs-1] = nRirs-1
310 |                             else:
311 |                                 is_flipped_moving = 0
312 |                                 ev_speed = 0
313 |                                 riridx = np.array([self._rnd_generator.integers(0,nRirs)])
314 |                             riridx = riridx.astype('int')
315 | 
316 |                             if nl == 0 and layer==0:
317 |                                 nth_mixture['event_onoffsets'] = np.array([[time_idx/10., (time_idx+event_duration_nl)/10.]])
318 |                                 nth_mixture['doa_azel'] = [cart2sph(traj_doas[ev_traj][riridx,:])]
319 |                                 nth_mixture['sample_onoffsets'] = [sample_onoffsets]
320 |                             else:
321 |                                 nth_mixture['event_onoffsets'] = np.vstack((nth_mixture['event_onoffsets'], np.array([time_idx/10., (time_idx+event_duration_nl)/10.])))
322 |                                 nth_mixture['doa_azel'].append(cart2sph(traj_doas[ev_traj][riridx,:]))
323 |                                 nth_mixture['sample_onoffsets'].append(sample_onoffsets)
324 |                                          
325 |                             nth_mixture['files'] = np.append(nth_mixture['files'], foldlist_nf['audiofile'][event_nl])
326 |                             nth_mixture['class'] = np.append(nth_mixture['class'], self._class2activeClassmap[int(foldlist_nf['class'][event_nl])])
327 |                             nth_mixture['trajectory'] = np.append(nth_mixture['trajectory'], ev_traj)
328 |                             nth_mixture['isMoving'] = np.append(nth_mixture['isMoving'], is_moving)
329 |                             nth_mixture['isFlippedMoving'] = np.append(nth_mixture['isFlippedMoving'], is_flipped_moving)
330 |                             nth_mixture['speed'] = np.append(nth_mixture['speed'], ev_speed)
331 |                             nth_mixture['rirs'].append(riridx)
332 | 
333 |                             
334 |                             time_idx += event_duration_nl
335 |                         
336 |                         # sort overlapped events by temporal appearance
337 |                     sort_idx = np.argsort(nth_mixture['event_onoffsets'][:,0])
338 |                     nth_mixture['files'] = nth_mixture['files'][sort_idx]
339 |                     nth_mixture['class'] = nth_mixture['class'][sort_idx]
340 |                     nth_mixture['event_onoffsets'] = nth_mixture['event_onoffsets'][sort_idx]
341 |                     #nth_mixture['sample_onoffsets'] = nth_mixture['sample_onoffsets'][sort_idx]
342 |                     nth_mixture['trajectory'] = nth_mixture['trajectory'][sort_idx]
343 |                     nth_mixture['isMoving'] = nth_mixture['isMoving'][sort_idx]
344 |                     nth_mixture['isFlippedMoving'] = nth_mixture['isFlippedMoving'][sort_idx]
345 |                     nth_mixture['speed'] = nth_mixture['speed'][sort_idx]
346 |                     nth_mixture['rirs'] = np.array(nth_mixture['rirs'],dtype=object)
347 |                     nth_mixture['rirs'] = nth_mixture['rirs'][sort_idx]
348 |                     new_doas = np.zeros(len(sort_idx),dtype=object)
349 |                     new_sample_onoffsets = np.zeros(len(sort_idx),dtype=object)
350 |                     upd_idx = 0
351 |                     for idx in sort_idx:
352 |                         new_doas[upd_idx] = nth_mixture['doa_azel'][idx].T
353 |                         new_sample_onoffsets[upd_idx] = nth_mixture['sample_onoffsets'][idx]
354 |                         upd_idx += 1
355 |                     nth_mixture['doa_azel'] = new_doas
356 |                     nth_mixture['sample_onoffsets'] = new_sample_onoffsets
357 |                 
358 |                     #accumulate mixtures for each room
359 |                     fold_mixture['mixture'].append(nth_mixture)
360 |                 #accumulate rooms
361 |                 room_mixtures.append(fold_mixture)
362 |             #accumulate mixtures per fold
363 |             self._mixtures.append(room_mixtures)
364 |             
365 | 
366 |         return self._mixtures, self._mixture_setup, foldlist
367 |     
368 |     def prepare_metadata_and_stats(self):
369 |         print('Calculate statistics and prepate metadata')
370 |         stats = []
371 |         self._metadata = []
372 |         stats = {}
373 |         stats['nFrames_total'] = self._mixture_setup['nb_folds'] * self._nb_mixtures_per_fold * self._nb_frames if np.isscalar(self._nb_mixtures_per_fold) else np.sum(self._nb_mixtures_per_fold) * self._nb_frames
374 |         stats['class_multi_instance'] = np.zeros(self._nb_classes)
375 |         stats['class_instances'] = np.zeros(self._nb_classes)
376 |         stats['class_nEvents'] = np.zeros(self._nb_classes)
377 |         stats['class_presence'] = np.zeros(self._nb_classes)
378 |         
379 |         stats['polyphony'] = np.zeros(self._mixture_setup['nOverlap']+1)
380 |         stats['event_presence'] = 0
381 |         stats['nEvents_total'] = 0
382 |         stats['nEvents_static'] = 0
383 |         stats['nEvents_moving'] = 0
384 |         
385 |         for nfold in range(self._mixture_setup['nb_folds']):
386 |             print('Statistics and metadata for fold {}'.format(nfold+1))
387 |             rooms = self._mixtures[nfold][0]['roomidx']
388 |             nb_rooms = len(rooms)
389 |             room_mixtures=[]
390 |             for nr in range(nb_rooms):
391 |                 nb_mixtures = len(self._mixtures[nfold][nr]['mixture'])
392 |                 per_room_mixtures = []
393 |                 for nmix in range(nb_mixtures):
394 |                     mixture = {'classid': np.array([]), 'trackid': np.array([]), 'eventtimetracks': np.array([]), 'eventdoatimetracks': np.array([])}
395 |                     mixture_nm = self._mixtures[nfold][nr]['mixture'][nmix]
396 |                     event_classes = mixture_nm['class']
397 |                     event_states = mixture_nm['isMoving']
398 |                     
399 |                     #idx of events and interferers
400 |                     nb_events = len(event_classes)
401 |                     nb_events_moving = np.sum(event_states)
402 |                     stats['nEvents_total'] += nb_events
403 |                     stats['nEvents_static'] += nb_events - nb_events_moving
404 |                     stats['nEvents_moving'] += nb_events_moving
405 | 
406 |                     # number of events per class
407 |                     for nc in range(self._mixture_setup['nb_classes']):
408 |                         nb_class_events = np.sum(event_classes == nc)
409 |                         stats['class_nEvents'][nc] += nb_class_events
410 |                     
411 |                     # store a timeline for each event
412 |                     eventtimetracks = np.zeros((self._nb_frames, nb_events))
413 |                     eventdoatimetracks = np.nan*np.ones((self._nb_frames, 2, nb_events))
414 | 
415 |                     #prepare metadata for synthesis
416 |                     for nev in range(nb_events):
417 |                         event_onoffset = mixture_nm['event_onoffsets'][nev,:]*10
418 |                         doa_azel = np.round(mixture_nm['doa_azel'][nev])
419 |                         #zero the activity according to perceptual onsets/offsets
420 |                         sample_onoffsets = mixture_nm['sample_onoffsets'][nev]
421 |                         ev_idx = np.arange(event_onoffset[0], event_onoffset[1]+0.1,dtype=int)
422 |                         activity_mask = np.zeros(len(ev_idx),dtype=int)
423 |                         sample_shape = np.shape(sample_onoffsets)
424 |                         if len(sample_shape) == 1:
425 |                             activity_mask[np.arange(int(np.round(sample_onoffsets[0]*10)),int(np.round(sample_onoffsets[1]*10)))] = 1
426 |                         else:
427 |                             for nseg in range(sample_shape[0]):
428 |                                 ran = np.arange(int(np.round(sample_onoffsets[nseg,0]*10)),int(np.round((sample_onoffsets[nseg,1])*10)))
429 |                                 activity_mask[ran] = 1
430 |                         
431 |                         if len(activity_mask) > len(ev_idx):
432 |                             activity_mask = activity_mask[0:len(ev_idx)]
433 | 
434 |                         if np.shape(doa_azel)[0] == 1:
435 |                             # static event
436 |                             try:
437 |                                 eventtimetracks[ev_idx, nev] = activity_mask
438 |                                 eventdoatimetracks[ev_idx[activity_mask.astype(bool)],0,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,0]
439 |                                 eventdoatimetracks[ev_idx[activity_mask.astype(bool)],1,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,1]
440 |                             except IndexError:
441 |                                  excess_idx = len(np.argwhere(ev_idx >= self._nb_frames))
442 |                                  ev_idx = ev_idx[:-excess_idx]
443 |                                  if len(activity_mask) > len(ev_idx):
444 |                                      activity_mask = activity_mask[0:len(ev_idx)]
445 |                                  eventtimetracks[ev_idx, nev] = activity_mask
446 |                                  eventdoatimetracks[ev_idx[activity_mask.astype(bool)],0,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,0]
447 |                                  eventdoatimetracks[ev_idx[activity_mask.astype(bool)],1,nev] = np.ones(np.sum(activity_mask==1))*doa_azel[0,1]
448 | 
449 |                         else:
450 |                             # moving event
451 |                             nb_doas = np.shape(doa_azel)[0]
452 |                             ev_idx = ev_idx[:nb_doas]
453 |                             activity_mask = activity_mask[:nb_doas]
454 |                             try:
455 |                                 eventtimetracks[ev_idx,nev] = activity_mask
456 |                                 eventdoatimetracks[ev_idx[activity_mask.astype(bool)],:,nev] = doa_azel[activity_mask.astype(bool),:]
457 |                             except IndexError:
458 |                                 excess_idx = len(np.argwhere(ev_idx >= self._nb_frames))
459 |                                 ev_idx = ev_idx[:-excess_idx]
460 |                                 if len(activity_mask) > len(ev_idx):
461 |                                     activity_mask = activity_mask[0:len(ev_idx)]
462 |                                 eventtimetracks[ev_idx,nev] = activity_mask
463 |                                 eventdoatimetracks[ev_idx[activity_mask.astype(bool)],:,nev] = doa_azel[activity_mask.astype(bool),:]
464 | 
465 |                     mixture['classid'] = event_classes
466 |                     mixture['trackid'] = np.arange(0,nb_events)
467 |                     mixture['eventtimetracks'] = eventtimetracks
468 |                     mixture['eventdoatimetracks'] = eventdoatimetracks
469 |                     
470 |                     for nf in range(self._nb_frames):
471 |                         # find active events
472 |                         active_events = np.argwhere(eventtimetracks[nf,:] > 0)
473 |                         # find the classes of the active events
474 |                         active_classes = event_classes[active_events]
475 |                         
476 |                         if not active_classes.ndim and active_classes.size:
477 |                             # add to zero polyphony
478 |                             stats['polyphony'][0] += 1
479 |                         else:
480 |                             # add to general event presence
481 |                             stats['event_presence'] += 1
482 |                             # number of simultaneous events
483 |                             nb_active = len(active_events)
484 | 
485 |                             # add to respective polyphony
486 |                             try:
487 |                                 stats['polyphony'][nb_active] += 1
488 |                             except IndexError:
489 |                                 pass #TODO: this is a workaround for less than 1% border cases, needs to be fixed although not very relevant
490 |                             
491 |                             # presence, instances and multi-instance for each class
492 |                             
493 |                             for nc in range(self._mixture_setup['nb_classes']):
494 |                                 nb_instances = np.sum(active_classes == nc)
495 |                                 if nb_instances > 0:
496 |                                     stats['class_presence'][nc] += 1
497 |                                 if nb_instances > 1:
498 |                                     stats['class_multi_instance'][nc] += 1
499 |                                 stats['class_instances'][nc] += nb_instances
500 |                     per_room_mixtures.append(mixture)
501 |                 room_mixtures.append(per_room_mixtures)
502 |             self._metadata.append(room_mixtures)
503 |          
504 |         # compute average polyphony
505 |         weighted_polyphony_sum = 0
506 |         for nn in range(self._mixture_setup['nOverlap']):
507 |             weighted_polyphony_sum += nn * stats['polyphony'][nn+1]
508 |         
509 |         stats['avg_polyphony'] = weighted_polyphony_sum / stats['event_presence']
510 |         
511 |         #event percentages
512 |         stats['class_event_pc'] = np.round(stats['class_nEvents']*1000./stats['nEvents_total'])/10.
513 |         stats['event_presence_pc'] = np.round(stats['event_presence']*1000./stats['nFrames_total'])/10.
514 |         stats['class_presence_pc'] = np.round(stats['class_presence']*1000./stats['nFrames_total'])/10.
515 |         # percentage of frames with same-class instances
516 |         stats['multi_class_pc'] = np.round(np.sum(stats['class_multi_instance']*1000./stats['nFrames_total']))/10.
517 | 
518 | 
519 |         return self._metadata, stats
520 |     
521 |     def write_metadata(self):
522 |         if not os.path.isdir(self._metadata_path):
523 |             os.makedirs(self._metadata_path)
524 |         
525 |         for nfold in range(self._mixture_setup['nb_folds']):
526 |             print('Writing metadata files for fold {}'.format(nfold+1))
527 |             nb_rooms = len(self._metadata[nfold])
528 |             for nr in range(nb_rooms):
529 |                 nb_mixtures = len(self._metadata[nfold][nr])
530 |                 for nmix in range(nb_mixtures):
531 |                     print('Mixture {}'.format(nmix))
532 |                     metadata_nm = self._metadata[nfold][nr][nmix]
533 |                     
534 |                     # write to filename, omitting non-active frames
535 |                     mixture_filename = 'fold{}_room{}_mix{:03}.csv'.format(nfold+1, nr+1, nmix+1)
536 |                     file_id = open(self._metadata_path + '/' + mixture_filename, 'w', newline="")
537 |                     metadata_writer = csv.writer(file_id,delimiter=',',quoting = csv.QUOTE_NONE)
538 |                     for nf in range(self._nb_frames):
539 |                         # find active events
540 |                         active_events = np.argwhere(metadata_nm['eventtimetracks'][nf, :]>0)
541 |                         nb_active = len(active_events)
542 |                         
543 |                         if nb_active > 0:
544 |                             # find the classes of active events
545 |                             active_classes = metadata_nm['classid'][active_events]
546 |                             active_tracks = metadata_nm['trackid'][active_events]
547 |                             
548 |                             # write to file
549 |                             for na in range(nb_active):
550 |                                 classidx = int(active_classes[na][0]) #additional zero index since it's packed in an array
551 |                                 trackidx = int(active_tracks[na][0])
552 |                                 
553 |                                 azim = int(metadata_nm['eventdoatimetracks'][nf,0,active_events][na][0])
554 |                                 elev = int(metadata_nm['eventdoatimetracks'][nf,1,active_events][na][0])
555 |                                 metadata_writer.writerow([nf,classidx,trackidx,azim,elev])
556 |                     file_id.close()
557 |                                 
558 |                                 
559 |                                 
560 |                             
561 |         
562 |         
563 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy
  3 | 
  4 | 
  5 | def sample_from_quartiles(K, stats):
  6 |     minn = stats[0]
  7 |     maxx = stats[4]
  8 |     quart1 = stats[1]
  9 |     mediann = stats[2]
 10 |     quart3 = stats[3]
 11 |     samples = minn + (quart1 - minn)*np.random.rand(K, 1)
 12 |     samples = np.append(samples,quart1)
 13 |     samples = np.append(samples, quart1 + (mediann-quart1)*np.random.rand(K,1))
 14 |     samples = np.append(samples,mediann)
 15 |     samples = np.append(samples, mediann + (quart3-mediann)*np.random.rand(K,1))
 16 |     samples = np.append(samples, quart3)
 17 |     samples = np.append(samples, quart3 + (maxx-quart3)*np.random.rand(K,1))
 18 |     
 19 |     return samples
 20 | 
 21 | def cart2sph(xyz):
 22 |     return_list = False
 23 |     if len(np.shape(xyz)) == 2:
 24 |         return_list = True
 25 |         x = xyz[:, 0]
 26 |         y = xyz[:, 1]
 27 |         z = xyz[:, 2]
 28 |     else:
 29 |         x = xyz[0]
 30 |         y = xyz[1]
 31 |         z = xyz[2]
 32 |     
 33 |     azimuth = np.arctan2(y, x) * 180. / np.pi
 34 |     elevation = np.arctan2(z, np.sqrt(x**2 + y**2)) * 180. / np.pi
 35 |     if return_list:
 36 |         return np.stack((azimuth,elevation),axis=0)
 37 |     else:
 38 |         return np.array([azimuth, elevation])
 39 |     
 40 | 
 41 | def stft_ham(insig, winsize=256, fftsize=512, hopsize=128):
 42 |     nb_dim = len(np.shape(insig))
 43 |     lSig = int(np.shape(insig)[0])
 44 |     nCHin = int(np.shape(insig)[1]) if nb_dim > 1 else 1
 45 |     x = np.arange(0,winsize)
 46 |     nBins = int(fftsize/2 + 1)
 47 |     nWindows = int(np.ceil(lSig/(2.*hopsize)))
 48 |     nFrames = int(2*nWindows+1)
 49 |     
 50 |     winvec = np.zeros((len(x),nCHin))
 51 |     for i in range(nCHin):
 52 |         winvec[:,i] = np.sin(x*(np.pi/winsize))**2
 53 |     
 54 |     frontpad = winsize-hopsize
 55 |     backpad = nFrames*hopsize-lSig
 56 | 
 57 |     if nb_dim > 1:
 58 |         insig_pad = np.pad(insig,((frontpad,backpad),(0,0)),'constant')
 59 |         spectrum = np.zeros((nBins, nFrames, nCHin),dtype='complex')
 60 |     else:
 61 |         insig_pad = np.pad(insig,((frontpad,backpad)),'constant')
 62 |         spectrum = np.zeros((nBins, nFrames),dtype='complex')
 63 | 
 64 |     idx=0
 65 |     nf=0
 66 |     if nb_dim > 1:
 67 |         while nf <= nFrames-1:
 68 |             insig_win = np.multiply(winvec, insig_pad[idx+np.arange(0,winsize),:])
 69 |             inspec = scipy.fft.fft(insig_win,n=fftsize,norm='backward',axis=0)
 70 |             #inspec = scipy.fft.fft(insig_win,n=fftsize,axis=0)
 71 |             inspec=inspec[:nBins,:]
 72 |             spectrum[:,nf,:] = inspec
 73 |             idx += hopsize
 74 |             nf += 1
 75 |     else:
 76 |         while nf <= nFrames-1:
 77 |             insig_win = np.multiply(winvec[:,0], insig_pad[idx+np.arange(0,winsize)])
 78 |             inspec = scipy.fft.fft(insig_win,n=fftsize,norm='backward',axis=0)
 79 |             #inspec = scipy.fft.fft(insig_win,n=fftsize,axis=0)
 80 |             inspec=inspec[:nBins]
 81 |             spectrum[:,nf] = inspec
 82 |             idx += hopsize
 83 |             nf += 1
 84 |     
 85 |     return spectrum
 86 |     
 87 |     
 88 | def ctf_ltv_direct(sig, irs, ir_times, fs, win_size):
 89 |     convsig = []
 90 |     win_size = int(win_size)
 91 |     hop_size = int(win_size / 2)
 92 |     fft_size = win_size*2
 93 |     nBins = int(fft_size/2)+1
 94 |     
 95 |     # IRs
 96 |     ir_shape = np.shape(irs)
 97 |     sig_shape = np.shape(sig)
 98 |     
 99 |     lIr = ir_shape[0]
100 | 
101 |     if len(ir_shape) == 2:
102 |         nIrs = ir_shape[1]
103 |         nCHir = 1
104 |     elif len(ir_shape) == 3:
105 |         nIrs = ir_shape[2]
106 |         nCHir = ir_shape[1]
107 |     
108 |     if nIrs != len(ir_times):
109 |         return ValueError('Bad ir times')
110 |     
111 |     # number of STFT frames for the IRs (half-window hopsize)
112 |     
113 |     nIrWindows = int(np.ceil(lIr/win_size))
114 |     nIrFrames = 2*nIrWindows+1
115 |     # number of STFT frames for the signal (half-window hopsize)
116 |     lSig = sig_shape[0]
117 |     nSigWindows = np.ceil(lSig/win_size)
118 |     nSigFrames = 2*nSigWindows+1
119 |     
120 |     # quantize the timestamps of each IR to multiples of STFT frames (hopsizes)
121 |     tStamps = np.round((ir_times*fs+hop_size)/hop_size)
122 |     
123 |     # create the two linear interpolator tracks, for the pairs of IRs between timestamps
124 |     nIntFrames = int(tStamps[-1])
125 |     Gint = np.zeros((nIntFrames, nIrs))
126 |     for ni in range(nIrs-1):
127 |         tpts = np.arange(tStamps[ni],tStamps[ni+1]+1,dtype=int)-1
128 |         ntpts = len(tpts)
129 |         ntpts_ratio = np.arange(0,ntpts)/(ntpts-1)
130 |         Gint[tpts,ni] = 1-ntpts_ratio
131 |         Gint[tpts,ni+1] = ntpts_ratio
132 |     
133 |     # compute spectra of irs
134 |     
135 |     if nCHir == 1:
136 |         irspec = np.zeros((nBins, nIrFrames, nIrs),dtype=complex)
137 |     else:
138 |         temp_spec = stft_ham(irs[:, :, 0], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2)
139 |         irspec = np.zeros((nBins, np.shape(temp_spec)[1], nCHir, nIrs),dtype=complex)
140 |     
141 |     for ni in range(nIrs):
142 |         if nCHir == 1:
143 |             irspec[:, :, ni] = stft_ham(irs[:, ni], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2)
144 |         else:
145 |             spec = stft_ham(irs[:, :, ni], winsize=win_size, fftsize=2*win_size,hopsize=win_size//2)
146 |             irspec[:, :, :, ni] = spec#np.transpose(spec, (0, 2, 1))
147 |     
148 |     #compute input signal spectra
149 |     sigspec = stft_ham(sig, winsize=win_size,fftsize=2*win_size,hopsize=win_size//2)
150 |     #initialize interpolated time-variant ctf
151 |     Gbuf = np.zeros((nIrFrames, nIrs))
152 |     if nCHir == 1:
153 |         ctf_ltv = np.zeros((nBins, nIrFrames),dtype=complex)
154 |     else:
155 |         ctf_ltv = np.zeros((nBins,nIrFrames,nCHir),dtype=complex)
156 |     
157 |     S = np.zeros((nBins, nIrFrames),dtype=complex)
158 |     
159 |     #processing loop
160 |     idx = 0
161 |     nf = 0
162 |     inspec_pad = sigspec
163 |     nFrames = int(np.min([np.shape(inspec_pad)[1], nIntFrames]))
164 |     
165 |     convsig = np.zeros((win_size//2 + nFrames*win_size//2 + fft_size-win_size, nCHir))
166 |     
167 |     while nf <= nFrames-1:
168 |         #compute interpolated ctf
169 |         Gbuf[1:, :] = Gbuf[:-1, :]
170 |         Gbuf[0, :] = Gint[nf, :]
171 |         if nCHir == 1:
172 |             for nif in range(nIrFrames):
173 |                 ctf_ltv[:, nif] = np.matmul(irspec[:,nif,:], Gbuf[nif,:].astype(complex))
174 |         else:
175 |             for nch in range(nCHir):
176 |                 for nif in range(nIrFrames):
177 |                     ctf_ltv[:,nif,nch] = np.matmul(irspec[:,nif,nch,:],Gbuf[nif,:].astype(complex))
178 |         inspec_nf = inspec_pad[:, nf]
179 |         S[:,1:nIrFrames] = S[:, :nIrFrames-1]
180 |         S[:, 0] = inspec_nf
181 |         
182 |         repS = np.tile(np.expand_dims(S,axis=2), [1, 1, nCHir])
183 |         convspec_nf = np.squeeze(np.sum(repS * ctf_ltv,axis=1))
184 |         first_dim = np.shape(convspec_nf)[0]
185 |         convspec_nf = np.vstack((convspec_nf, np.conj(convspec_nf[np.arange(first_dim-1, 1, -1)-1,:])))
186 |         convsig_nf = np.real(scipy.fft.ifft(convspec_nf, fft_size, norm='forward', axis=0)) ## get rid of the imaginary numerical error remain
187 |         # convsig_nf = np.real(scipy.fft.ifft(convspec_nf, fft_size, axis=0))
188 |         #overlap-add synthesis
189 |         convsig[idx+np.arange(0,fft_size),:] += convsig_nf
190 |         #advance sample pointer
191 |         idx += hop_size
192 |         nf += 1
193 |     
194 |     convsig = convsig[(win_size):(nFrames*win_size)//2,:]
195 |     
196 |     return convsig


--------------------------------------------------------------------------------