├── .gitignore ├── CODE_OF_CONDUCT.md ├── DNSMOS ├── DNSMOS │ ├── bak_ovr.onnx │ ├── model_v8.onnx │ ├── sig.onnx │ └── sig_bak_ovr.onnx ├── README.md ├── dnsmos_local.py └── pDNSMOS │ └── sig_bak_ovr.onnx ├── LICENSE ├── LICENSE-CODE ├── README-DNS3.md ├── README.md ├── SECURITY.md ├── V5_DNS_Challenge_FinalResults.pdf ├── WAcc └── WAcc.py ├── audiolib.py ├── docs ├── CMT Instructions for uploading enhanced clips_ICASSP2022.pdf ├── ICASSP_2021_DNS_challenge.pdf └── ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf ├── download-dns-challenge-1.sh ├── download-dns-challenge-2.sh ├── download-dns-challenge-3.sh ├── download-dns-challenge-4-pdns.sh ├── download-dns-challenge-4.sh ├── download-dns-challenge-5-baseline.sh ├── download-dns-challenge-5-filelists-headset.sh ├── download-dns-challenge-5-filelists-speakerphone.sh ├── download-dns-challenge-5-headset-training.sh ├── download-dns-challenge-5-noise-ir.sh ├── download-dns-challenge-5-paralinguistic-train.sh ├── download-dns-challenge-5-speakerphone-training.sh ├── download-dns5-blind-testset.sh ├── download-dns5-dev-testset.sh ├── download_dns_v2_v3_blindset.sh ├── index.html ├── noisyspeech_synthesizer.cfg ├── noisyspeech_synthesizer_singleprocess.py ├── pdns_noisyspeech_synthesizer_singleprocess.py ├── pdns_synthesizer_icassp2023.cfg ├── requirements.txt ├── unit_tests_synthesizer.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | datasets/ 2 | datasets_fullband/ 3 | training_set/ 4 | training_set2/ 5 | training_set2_onlyrealrir/ 6 | training_set4/ 7 | training_set5/ 8 | logs/ 9 | test_set2/ 10 | training_set_sept11/ 11 | training_set_sept12/ 12 | __pycache__/ 13 | *.pyc 14 | *~ 15 | /.vs/ 16 | /.vscode/ 17 | *.wav 18 | *.tar.bz2 19 | *.zip 20 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Microsoft Open Source Code of Conduct 2 | 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 4 | 5 | Resources: 6 | 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns 10 | -------------------------------------------------------------------------------- /DNSMOS/DNSMOS/bak_ovr.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/bak_ovr.onnx -------------------------------------------------------------------------------- /DNSMOS/DNSMOS/model_v8.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/model_v8.onnx -------------------------------------------------------------------------------- /DNSMOS/DNSMOS/sig.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig.onnx -------------------------------------------------------------------------------- /DNSMOS/DNSMOS/sig_bak_ovr.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig_bak_ovr.onnx -------------------------------------------------------------------------------- /DNSMOS/README.md: -------------------------------------------------------------------------------- 1 | # DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors 2 | 3 | Human subjective evaluation is the ”gold standard” to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. DNSMOS generalizes well in challenging test conditions with a high correlation to human ratings in stack ranking noise suppression methods. More details can be found in [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf). 4 | 5 | ## Evaluation methodology: 6 | Use the **dnsmos_local.py** script. 7 | 1. To compute a personalized MOS score (where interfering speaker is penalized) provide the '-p' argument 8 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv -p 9 | 2. To compute a regular MOS score omit the '-p' argument. 10 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv 11 | 12 | ## Citation: 13 | If you have used the API for your research and development purpose, please cite the [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf): 14 | ```BibTex 15 | @inproceedings{reddy2021dnsmos, 16 | title={Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors}, 17 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross}, 18 | booktitle={ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 19 | pages={6493--6497}, 20 | year={2021}, 21 | organization={IEEE} 22 | } 23 | ``` 24 | 25 | If you used DNSMOS P.835 please cite the [DNSMOS P.835](https://arxiv.org/pdf/2110.01763.pdf) paper: 26 | 27 | ```BibTex 28 | @inproceedings{reddy2022dnsmos, 29 | title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors}, 30 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross}, 31 | booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 32 | year={2022}, 33 | organization={IEEE} 34 | } 35 | ``` 36 | -------------------------------------------------------------------------------- /DNSMOS/dnsmos_local.py: -------------------------------------------------------------------------------- 1 | # Usage: 2 | # python dnsmos_local.py -t c:\temp\DNSChallenge4_Blindset -o DNSCh4_Blind.csv -p 3 | # 4 | 5 | import argparse 6 | import concurrent.futures 7 | import glob 8 | import os 9 | 10 | import librosa 11 | import numpy as np 12 | import numpy.polynomial.polynomial as poly 13 | import onnxruntime as ort 14 | import pandas as pd 15 | import soundfile as sf 16 | from requests import session 17 | from tqdm import tqdm 18 | 19 | SAMPLING_RATE = 16000 20 | INPUT_LENGTH = 9.01 21 | 22 | class ComputeScore: 23 | def __init__(self, primary_model_path, p808_model_path) -> None: 24 | self.onnx_sess = ort.InferenceSession(primary_model_path) 25 | self.p808_onnx_sess = ort.InferenceSession(p808_model_path) 26 | 27 | def audio_melspec(self, audio, n_mels=120, frame_size=320, hop_length=160, sr=16000, to_db=True): 28 | mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_fft=frame_size+1, hop_length=hop_length, n_mels=n_mels) 29 | if to_db: 30 | mel_spec = (librosa.power_to_db(mel_spec, ref=np.max)+40)/40 31 | return mel_spec.T 32 | 33 | def get_polyfit_val(self, sig, bak, ovr, is_personalized_MOS): 34 | if is_personalized_MOS: 35 | p_ovr = np.poly1d([-0.00533021, 0.005101 , 1.18058466, -0.11236046]) 36 | p_sig = np.poly1d([-0.01019296, 0.02751166, 1.19576786, -0.24348726]) 37 | p_bak = np.poly1d([-0.04976499, 0.44276479, -0.1644611 , 0.96883132]) 38 | else: 39 | p_ovr = np.poly1d([-0.06766283, 1.11546468, 0.04602535]) 40 | p_sig = np.poly1d([-0.08397278, 1.22083953, 0.0052439 ]) 41 | p_bak = np.poly1d([-0.13166888, 1.60915514, -0.39604546]) 42 | 43 | sig_poly = p_sig(sig) 44 | bak_poly = p_bak(bak) 45 | ovr_poly = p_ovr(ovr) 46 | 47 | return sig_poly, bak_poly, ovr_poly 48 | 49 | def __call__(self, fpath, sampling_rate, is_personalized_MOS): 50 | aud, input_fs = sf.read(fpath) 51 | fs = sampling_rate 52 | if input_fs != fs: 53 | audio = librosa.resample(aud, input_fs, fs) 54 | else: 55 | audio = aud 56 | actual_audio_len = len(audio) 57 | len_samples = int(INPUT_LENGTH*fs) 58 | while len(audio) < len_samples: 59 | audio = np.append(audio, audio) 60 | 61 | num_hops = int(np.floor(len(audio)/fs) - INPUT_LENGTH)+1 62 | hop_len_samples = fs 63 | predicted_mos_sig_seg_raw = [] 64 | predicted_mos_bak_seg_raw = [] 65 | predicted_mos_ovr_seg_raw = [] 66 | predicted_mos_sig_seg = [] 67 | predicted_mos_bak_seg = [] 68 | predicted_mos_ovr_seg = [] 69 | predicted_p808_mos = [] 70 | 71 | for idx in range(num_hops): 72 | audio_seg = audio[int(idx*hop_len_samples) : int((idx+INPUT_LENGTH)*hop_len_samples)] 73 | if len(audio_seg) < len_samples: 74 | continue 75 | 76 | input_features = np.array(audio_seg).astype('float32')[np.newaxis,:] 77 | p808_input_features = np.array(self.audio_melspec(audio=audio_seg[:-160])).astype('float32')[np.newaxis, :, :] 78 | oi = {'input_1': input_features} 79 | p808_oi = {'input_1': p808_input_features} 80 | p808_mos = self.p808_onnx_sess.run(None, p808_oi)[0][0][0] 81 | mos_sig_raw,mos_bak_raw,mos_ovr_raw = self.onnx_sess.run(None, oi)[0][0] 82 | mos_sig,mos_bak,mos_ovr = self.get_polyfit_val(mos_sig_raw,mos_bak_raw,mos_ovr_raw,is_personalized_MOS) 83 | predicted_mos_sig_seg_raw.append(mos_sig_raw) 84 | predicted_mos_bak_seg_raw.append(mos_bak_raw) 85 | predicted_mos_ovr_seg_raw.append(mos_ovr_raw) 86 | predicted_mos_sig_seg.append(mos_sig) 87 | predicted_mos_bak_seg.append(mos_bak) 88 | predicted_mos_ovr_seg.append(mos_ovr) 89 | predicted_p808_mos.append(p808_mos) 90 | 91 | clip_dict = {'filename': fpath, 'len_in_sec': actual_audio_len/fs, 'sr':fs} 92 | clip_dict['num_hops'] = num_hops 93 | clip_dict['OVRL_raw'] = np.mean(predicted_mos_ovr_seg_raw) 94 | clip_dict['SIG_raw'] = np.mean(predicted_mos_sig_seg_raw) 95 | clip_dict['BAK_raw'] = np.mean(predicted_mos_bak_seg_raw) 96 | clip_dict['OVRL'] = np.mean(predicted_mos_ovr_seg) 97 | clip_dict['SIG'] = np.mean(predicted_mos_sig_seg) 98 | clip_dict['BAK'] = np.mean(predicted_mos_bak_seg) 99 | clip_dict['P808_MOS'] = np.mean(predicted_p808_mos) 100 | return clip_dict 101 | 102 | def main(args): 103 | models = glob.glob(os.path.join(args.testset_dir, "*")) 104 | audio_clips_list = [] 105 | p808_model_path = os.path.join('DNSMOS', 'model_v8.onnx') 106 | 107 | if args.personalized_MOS: 108 | primary_model_path = os.path.join('pDNSMOS', 'sig_bak_ovr.onnx') 109 | else: 110 | primary_model_path = os.path.join('DNSMOS', 'sig_bak_ovr.onnx') 111 | 112 | compute_score = ComputeScore(primary_model_path, p808_model_path) 113 | 114 | rows = [] 115 | clips = [] 116 | clips = glob.glob(os.path.join(args.testset_dir, "*.wav")) 117 | is_personalized_eval = args.personalized_MOS 118 | desired_fs = SAMPLING_RATE 119 | for m in tqdm(models): 120 | max_recursion_depth = 10 121 | audio_path = os.path.join(args.testset_dir, m) 122 | audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav")) 123 | while len(audio_clips_list) == 0 and max_recursion_depth > 0: 124 | audio_path = os.path.join(audio_path, "**") 125 | audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav")) 126 | max_recursion_depth -= 1 127 | clips.extend(audio_clips_list) 128 | 129 | with concurrent.futures.ThreadPoolExecutor() as executor: 130 | future_to_url = {executor.submit(compute_score, clip, desired_fs, is_personalized_eval): clip for clip in clips} 131 | for future in tqdm(concurrent.futures.as_completed(future_to_url)): 132 | clip = future_to_url[future] 133 | try: 134 | data = future.result() 135 | except Exception as exc: 136 | print('%r generated an exception: %s' % (clip, exc)) 137 | else: 138 | rows.append(data) 139 | 140 | df = pd.DataFrame(rows) 141 | if args.csv_path: 142 | csv_path = args.csv_path 143 | df.to_csv(csv_path) 144 | else: 145 | print(df.describe()) 146 | 147 | if __name__=="__main__": 148 | parser = argparse.ArgumentParser() 149 | parser.add_argument('-t', "--testset_dir", default='.', 150 | help='Path to the dir containing audio clips in .wav to be evaluated') 151 | parser.add_argument('-o', "--csv_path", default=None, help='Dir to the csv that saves the results') 152 | parser.add_argument('-p', "--personalized_MOS", action='store_true', 153 | help='Flag to indicate if personalized MOS score is needed or regular') 154 | 155 | args = parser.parse_args() 156 | 157 | main(args) 158 | -------------------------------------------------------------------------------- /DNSMOS/pDNSMOS/sig_bak_ovr.onnx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/pDNSMOS/sig_bak_ovr.onnx -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. -------------------------------------------------------------------------------- /LICENSE-CODE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE 22 | -------------------------------------------------------------------------------- /README-DNS3.md: -------------------------------------------------------------------------------- 1 | 2 | # Deep Noise Suppression (DNS) Challenge 3 - INTERSPEECH 2021 3 | 4 | **NOTE:** This README describes the **PAST** DNS Challenge! 5 | 6 | The data for it is still available, and is described below. If you are interested in the latest DNS 7 | Challenge, please refer to the main [README.md](README.md) file. 8 | 9 | ## In this repository 10 | 11 | This repository contains the datasets and scripts required for INTERSPEECH 2021 DNS Challenge, AKA 12 | DNS Challenge 3, or DNS3. For more details about the challenge, please see our 13 | [paper](https://arxiv.org/pdf/2101.01902.pdf) and the challenge 14 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/). 15 | For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808). 16 | 17 | ## Details 18 | 19 | * The **datasets** directory is a placeholder for the wideband datasets. That is, our data 20 | downloader script by default will place the downloader audio data here. After the download, this 21 | directory will contain clean speech, noise, and room impulse responses required for creating the 22 | training data for wideband scenario. The script will also download here the test set that 23 | participants can use during the development stages. 24 | * The **datasets_fullband** directory is a placeholder for the fullband audio data. The downloader 25 | script will download here the datasets that contain clean speech and noise audio clips required 26 | for creating training data for fullband scenario. 27 | * The **NSNet2-baseline** directory contains the inference scripts and the ONNX model for the 28 | baseline Speech Enhancement method for wideband. 29 | * **download-dns-challenge-3.sh** - this is the script to download the data. By default, the data 30 | will be placed into `datasets/` and `datasets_fullband/` directories. Please take a look at the 31 | script and uncomment the perferred download method. Unmodified, the script performs a dry 32 | run and retrieves only the HTTP headers for each archive. 33 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for 34 | training purposes. 35 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are 36 | required to accurately specify different parameters and provide the right paths to the datasets 37 | required to synthesize noisy speech. 38 | * **audiolib.py** - contains modules required to synthesize datasets. 39 | * **utils.py** - contains some utility functions required to synthesize the data. 40 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data. 41 | * **requirements.txt** - contains all the libraries required for synthesizing the data. 42 | 43 | ## Datasets 44 | 45 | The default directory structure and the sizes of the datasets available for DNS Challenge are: 46 | 47 | ``` 48 | datasets 229G 49 | ├── clean 204G 50 | │   ├── emotional_speech 403M 51 | │   ├── french_data 21G 52 | │   ├── german_speech 66G 53 | │   ├── italian_speech 14G 54 | │   ├── mandarin_speech 21G 55 | │   ├── read_speech 61G 56 | │   ├── russian_speech 5.1G 57 | │   ├── singing_voice 979M 58 | │   └── spanish_speech 17G 59 | ├── dev_testset 211M 60 | ├── impulse_responses 4.3G 61 | │   ├── SLR26 2.1G 62 | │   └── SLR28 2.3G 63 | └── noise 20G 64 | ``` 65 | 66 | And, for the fullband data, 67 | ``` 68 | datasets_fullband 600G 69 | ├── clean_fullband 542G 70 | │   ├── VocalSet_48kHz_mono 974M 71 | │   ├── emotional_speech 1.2G 72 | │   ├── french_data 62G 73 | │   ├── german_speech 194G 74 | │   ├── italian_speech 42G 75 | │   ├── read_speech 182G 76 | │   ├── russian_speech 12G 77 | │   └── spanish_speech 50G 78 | ├── dev_testset_fullband 630M 79 | └── noise_fullband 58G 80 | ``` 81 | 82 | ## Code prerequisites 83 | - Python 3.6 and above 84 | - Python libraries: soundfile, librosa 85 | 86 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the 87 | `download-dns-challenge-3.sh` script in this repo to download the data. 88 | 89 | ## Usage: 90 | 91 | 1. Install Python libraries 92 | ```bash 93 | pip3 install soundfile librosa 94 | ``` 95 | 2. Clone the repository. 96 | ```bash 97 | git clone https://github.com/microsoft/DNS-Challenge 98 | ``` 99 | 100 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and 101 | include the paths to clean speech, noise and impulse response related csv files. Also, specify 102 | the paths to the destination directories and store the logs. 103 | 104 | 4. Create dataset 105 | ```bash 106 | python3 noisyspeech_synthesizer_singleprocess.py 107 | ``` 108 | 109 | ## Citation: 110 | If you use this dataset in a publication please cite the following paper:
111 | 112 | ```BibTex 113 | @inproceedings{reddy2021interspeech, 114 | title={INTERSPEECH 2021 Deep Noise Suppression Challenge}, 115 | author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram}, 116 | booktitle={INTERSPEECH}, 117 | year={2021} 118 | } 119 | ``` 120 | 121 | The baseline NSNet noise suppression:
122 | ```BibTex 123 | @inproceedings{9054254, 124 | author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}}, 125 | booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, 126 | Speech and Signal Processing (ICASSP)}, 127 | title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement}, 128 | year={2020}, volume={}, number={}, pages={871-875},} 129 | ``` 130 | 131 | ```BibTex 132 | @misc{braun2020data, 133 | title={Data augmentation and loss normalization for deep noise suppression}, 134 | author={Sebastian Braun and Ivan Tashev}, 135 | year={2020}, 136 | eprint={2008.06412}, 137 | archivePrefix={arXiv}, 138 | primaryClass={eess.AS} 139 | } 140 | ``` 141 | 142 | The P.835 test framework:
143 | ```BibTex 144 | @inproceedings{naderi2021crowdsourcing, 145 | title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing}, 146 | author={Naderi, Babak and Cutler, Ross}, 147 | booktitle={INTERSPEECH}, 148 | year={2021} 149 | } 150 | ``` 151 | 152 | DNSMOS API:
153 | ```BibTex 154 | @inproceedings{reddy2020dnsmos, 155 | title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors}, 156 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross}, 157 | booktitle={ICASSP}, 158 | year={2020} 159 | } 160 | ``` 161 | 162 | # Contributing 163 | 164 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 165 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 166 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 167 | 168 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a 169 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 170 | provided by the bot. You will only need to do this once across all repos using our CLA. 171 | 172 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 173 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 174 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 175 | 176 | # Legal Notices 177 | 178 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content 179 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), 180 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the 181 | [LICENSE-CODE](LICENSE-CODE) file. 182 | 183 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the 184 | documentation may be either trademarks or registered trademarks of Microsoft in the United States 185 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft 186 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at 187 | http://go.microsoft.com/fwlink/?LinkID=254653. 188 | 189 | Privacy information can be found at https://privacy.microsoft.com/en-us/ 190 | 191 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, 192 | or trademarks, whether by implication, estoppel or otherwise. 193 | 194 | 195 | ## Dataset licenses 196 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS. 197 | 198 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset. 199 | 200 | The datasets used in this project are licensed as follows: 201 | 1. Clean speech: 202 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/ 203 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/ 204 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y 205 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International 206 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset) 207 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/ 208 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ 209 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here. 210 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0. 211 | http://opendatacommons.org/licenses/by/1.0/ 212 | 213 | 2. Noise: 214 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/ 215 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/ 216 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA 217 | 218 | 3. RIR datasets: OpenSLR26 and OpenSLR28: 219 | * http://www.openslr.org/26/ 220 | * http://www.openslr.org/28/ 221 | * License: Apache 2.0 222 | 223 | ## Code license 224 | MIT License 225 | 226 | Copyright (c) Microsoft Corporation. 227 | 228 | Permission is hereby granted, free of charge, to any person obtaining a copy 229 | of this software and associated documentation files (the "Software"), to deal 230 | in the Software without restriction, including without limitation the rights 231 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 232 | copies of the Software, and to permit persons to whom the Software is 233 | furnished to do so, subject to the following conditions: 234 | 235 | The above copyright notice and this permission notice shall be included in all 236 | copies or substantial portions of the Software. 237 | 238 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 239 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 240 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 241 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 242 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 243 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 244 | SOFTWARE 245 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ICASSP 2023 Deep Noise Suppression Challenge 2 | Website: https://aka.ms/dns-challenge 3 | Git Repo: https://github.com/microsoft/DNS-Challenge 4 | Challenge Paper: 5 | 6 | ## Important features of this challenge 7 | 1. Along with noise suppression, it includes de-reverberation and suppression of interfering talkers for headset and speakerphone scenarios. 8 | 2. The challenge has two tracks: (i) Headset (wired/wireless headphone, earbuds such as airpods etc.) speech enhancement; (ii) Non-headset (speakerphone, built-in mic in laptop/desktop/mobile phone/other meeting devices etc.) speech enhancement. 9 | 3. This challenge adopts the ITU-T P.835 subjective test framework to measure speech quality (SIG), background noise quality (BAK), and overall audio quality (OVRL). We modified the ITU-T P.835 to make it reliable for test clips with interfering (undesired neighboring) talkers. Along with P.835 scores, Word Accuracy (WAcc) is used to measure the performance of models. 10 | 4. Please NOTE that the intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added). 11 | 5. There are new requirements for model related latency. Please check all requirements listed at https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/ 12 | 13 | ## Baseline Speaker Embeddings 14 | This challenge adopted pretrained ECAPA-TDNN model available in SpeechBrain as baseline speaker embeddings models, available at https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb. Participants can use any other publically available speaker embeddings model or develop their own speaker embedding extractor. Participants are encourage to explore RawNet3 models available at https://github.com/jungjee/RawNet 15 | 16 | Previous DNS Challenge used RawNet2 speaker embeddings. So far, impact of different speaker embeddings for personalized speech enhancements is not studied in sufficient depth. 17 | 18 | # Install SpeechBrain with below command: 19 | pip install speechbrain 20 | 21 | #Compute Speaker Embeddings for your wav file with below command: 22 | 23 | import torchaudio 24 | from speechbrain.pretrained import EncoderClassifier 25 | classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb") 26 | signal, fs =torchaudio.load('tests/samples/ASR/spk1_snt1.wav') 27 | embeddings = classifier.encode_batch(signal) 28 | 29 | ## In this repository 30 | 31 | This repository contains the datasets and scripts required for 5th DNS Challenge at ICASSP 2023, aka DNS 32 | Challenge 5, or simply **DNS5**. For more details about the challenge, please see our 33 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/) and [paper](docs/ICASSP2023_5th_DNS_Challenge.pdf). For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808). 34 | 35 | ## Details 36 | 37 | * The **datasets_fullband** folder is a placeholder for the datasets. That is, our data downloader 38 | script by default will place the downloaded audio data there. After the download, it will contain 39 | clean speech, noise, and room impulse responses required for creating the training data. 40 | 41 | * The **Baseline** directory contains the enhanced clips from dev testset for both tracks. 42 | 43 | * **download-dns-challenge-5-headset-training.sh** - this is the script to download the data for headset (Track 1). By default, the data will be placed into the `./datasets_fullband/` folder. Please take a look at the script and **uncomment** the perferred download method._ Unmodified, the script performs a dry run and retrieves only the HTTP headers for each archive. 44 | 45 | * **download-dns-challenge-5-speakerphone-training.sh** - this is the script to download the data for speakerphone (Track 2). 46 | 47 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for 48 | training purposes. 49 | 50 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are 51 | required to accurately specify different parameters and provide the right paths to the datasets required to synthesize noisy speech. 52 | 53 | * **audiolib.py** - contains modules required to synthesize datasets. 54 | * **utils.py** - contains some utility functions required to synthesize the data. 55 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data. 56 | * **requirements.txt** - contains all the libraries required for synthesizing the data. 57 | 58 | ## Datasets 59 | **V5_dev_testset**: directory containing dev testsets for both tracks. Each testclip has 10s duration and the corresponding enrollment clips with 30s duration. 60 | 61 | **BLIND testset**: 62 | 63 | ## WAcc script 64 | https://github.com/microsoft/DNS-Challenge/tree/master/WAcc 65 | 66 | ## Wacc ground-truth transcript 67 | Dev testset: available only for speakerphone track, see v5_dev_testset directory. For headset track, we are providing ASR output and list of prompts read during recording of testclips. Participants can help in correcting ASR output to generate the ground-truth transcripts. 68 | Blind testset: 69 | 70 | ### Data info 71 | 72 | The default directory structure and the sizes of the datasets of the 5th DNS 73 | Challenge are: 74 | 75 | ``` 76 | datasets_fullband 77 | +-- dev_testset 78 | +-- impulse_responses 5.9G 79 | +-- noise_fullband 58G 80 | \-- clean_fullband 827G 81 | +-- emotional_speech 2.4G 82 | +-- french_speech 62G 83 | +-- german_speech 319G 84 | +-- italian_speech 42G 85 | +-- read_speech 299G 86 | +-- russian_speech 12G 87 | +-- spanish_speech 65G 88 | +-- vctk_wav48_silence_trimmed 27G 89 | \-- VocalSet_48kHz_mono 974M 90 | ``` 91 | 92 | In all, you will need about 1TB to store the _unpacked_ data. Archived, the same data takes about 93 | 550GB total. 94 | 95 | ### Headset DNS track 96 | ### Data checksums 97 | 98 | A CSV file containing file sizes and SHA1 checksums for audio clips in both Real-time *and* 99 | Personalized DNS datasets is available at: 100 | [dns5-datasets-files-sha1.csv.bz2](https://dns4public.blob.core.windows.net/dns4archive/dns5-datasets-files-sha1.csv.bz2). 101 | The archive is 41.3MB in size and can be read in Python like this: 102 | ```python 103 | import pandas as pd 104 | 105 | sha1sums = pd.read_csv("dns5-datasets-files-sha1.csv.bz2", names=["size", "sha1", "path"]) 106 | ``` 107 | 108 | ## Code prerequisites 109 | - Python 3.6 and above 110 | - Python libraries: soundfile, librosa 111 | 112 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the 113 | `download-dns-challenge-5*.sh` scripts in this repo to download the data. 114 | 115 | ## Usage: 116 | 117 | 1. Install Python libraries 118 | ```bash 119 | pip3 install soundfile librosa 120 | ``` 121 | 2. Clone the repository. 122 | ```bash 123 | git clone https://github.com/microsoft/DNS-Challenge 124 | ``` 125 | 126 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and 127 | include the paths to clean speech, noise and impulse response related csv files. Also, specify 128 | the paths to the destination directories and store the logs. 129 | 130 | 4. Create dataset 131 | ```bash 132 | python3 noisyspeech_synthesizer_singleprocess.py 133 | ``` 134 | 135 | ## Citation: 136 | If you use this dataset in a publication please cite the following paper:
137 | 138 | ```BibTex 139 | @inproceedings{dubey2023icassp, 140 | title={ICASSP 2023 Deep Noise Suppression Challenge}, 141 | author={ 142 | Dubey, Harishchandra and Aazami, Ashkan and Gopal, Vishak and Naderi, Babak and Braun, Sebastian and Cutler, Ross and Gamper, Hannes and Golestaneh, Mehrsa and Aichner, Robert}, 143 | booktitle={ICASSP}, 144 | year={2023} 145 | } 146 | ``` 147 | 148 | The previous challenges were: 149 | ```BibTex 150 | @inproceedings{dubey2022icassp, 151 | title={ICASSP 2022 Deep Noise Suppression Challenge}, 152 | author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Emre Sefik and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert}, 153 | booktitle={ICASSP}, 154 | year={2022} 155 | } 156 | 157 | @inproceedings{reddy2021interspeech, 158 | title={INTERSPEECH 2021 Deep Noise Suppression Challenge}, 159 | author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram}, 160 | booktitle={INTERSPEECH}, 161 | year={2021} 162 | } 163 | ``` 164 | ```BibTex 165 | @inproceedings{reddy2021icassp, 166 | title={ICASSP 2021 deep noise suppression challenge}, 167 | author={Reddy, Chandan KA and Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram}, 168 | booktitle={ICASSP}, 169 | year={2021}, 170 | } 171 | ``` 172 | ```BibTex 173 | @inproceedings{reddy2020interspeech, 174 | title={The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results}, 175 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross and Beyrami, Ebrahim and Cheng, Roger and Dubey, Harishchandra and Matusevych, Sergiy and Aichner, Robert and Aazami, Ashkan and Braun, Sebastian and others}, 176 | booktitle={INTERSPEECH}, 177 | year={2020} 178 | } 179 | ``` 180 | 181 | The baseline NSNet noise suppression:
182 | ```BibTex 183 | @inproceedings{9054254, 184 | author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}}, 185 | booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, 186 | Speech and Signal Processing (ICASSP)}, 187 | title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement}, 188 | year={2020}, volume={}, number={}, pages={871-875},} 189 | ``` 190 | 191 | ```BibTex 192 | @misc{braun2020data, 193 | title={Data augmentation and loss normalization for deep noise suppression}, 194 | author={Sebastian Braun and Ivan Tashev}, 195 | year={2020}, 196 | eprint={2008.06412}, 197 | archivePrefix={arXiv}, 198 | primaryClass={eess.AS} 199 | } 200 | ``` 201 | 202 | The P.835 test framework:
203 | ```BibTex 204 | @inproceedings{naderi2021crowdsourcing, 205 | title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing}, 206 | author={Naderi, Babak and Cutler, Ross}, 207 | booktitle={INTERSPEECH}, 208 | year={2021} 209 | } 210 | ``` 211 | 212 | DNSMOS API:
213 | ```BibTex 214 | @inproceedings{reddy2021dnsmos, 215 | title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors}, 216 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross}, 217 | booktitle={ICASSP}, 218 | year={2021} 219 | } 220 | ``` 221 | 222 | ```BibTex 223 | @inproceedings{reddy2022dnsmos, 224 | title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors}, 225 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross}, 226 | booktitle={ICASSP}, 227 | year={2022} 228 | } 229 | ``` 230 | 231 | # Contributing 232 | 233 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 234 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 235 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. 236 | 237 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a 238 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 239 | provided by the bot. You will only need to do this once across all repos using our CLA. 240 | 241 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 242 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 243 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 244 | 245 | # Legal Notices 246 | 247 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content 248 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), 249 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the 250 | [LICENSE-CODE](LICENSE-CODE) file. 251 | 252 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the 253 | documentation may be either trademarks or registered trademarks of Microsoft in the United States 254 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft 255 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at 256 | http://go.microsoft.com/fwlink/?LinkID=254653. 257 | 258 | Privacy information can be found at https://privacy.microsoft.com/en-us/ 259 | 260 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, 261 | or trademarks, whether by implication, estoppel or otherwise. 262 | 263 | 264 | ## Dataset licenses 265 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS. 266 | 267 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset. 268 | 269 | The datasets used in this project are licensed as follows: 270 | 1. Clean speech: 271 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/ 272 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/ 273 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y 274 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International 275 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset) 276 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/ 277 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ 278 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here. 279 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0. 280 | http://opendatacommons.org/licenses/by/1.0/ 281 | 282 | 2. Noise: 283 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/ 284 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/ 285 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA 286 | 287 | 3. RIR datasets: OpenSLR26 and OpenSLR28: 288 | * http://www.openslr.org/26/ 289 | * http://www.openslr.org/28/ 290 | * License: Apache 2.0 291 | 292 | ## Code license 293 | MIT License 294 | 295 | Copyright (c) Microsoft Corporation. 296 | 297 | Permission is hereby granted, free of charge, to any person obtaining a copy 298 | of this software and associated documentation files (the "Software"), to deal 299 | in the Software without restriction, including without limitation the rights 300 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 301 | copies of the Software, and to permit persons to whom the Software is 302 | furnished to do so, subject to the following conditions: 303 | 304 | The above copyright notice and this permission notice shall be included in all 305 | copies or substantial portions of the Software. 306 | 307 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 308 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 309 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 310 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 311 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 312 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 313 | SOFTWARE 314 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd). 40 | 41 | 42 | -------------------------------------------------------------------------------- /V5_DNS_Challenge_FinalResults.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/V5_DNS_Challenge_FinalResults.pdf -------------------------------------------------------------------------------- /WAcc/WAcc.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import os 4 | 5 | import librosa 6 | import numpy as np 7 | import pandas 8 | import pandas as pd 9 | import requests 10 | import soundfile as sf 11 | 12 | WACC_SERVICE_URL = 'https://wacc.azurewebsites.net/api/TriggerEvaluation?code=K2XN7ouruRN/2k1HNyS79ET39rEMZ9jOOCnFtodPDj42WJFjG9LWXg==' 13 | SUPPORTED_SAMPLING_RATE = 16000 14 | TRANSCRIPTIONS_FILE = 'DNSChallenge4_devtest.tsv' 15 | 16 | def main(args): 17 | audio_clips_list = glob.glob(os.path.join(args.testset_dir, "*.wav")) 18 | transcriptions_df = pd.read_csv(TRANSCRIPTIONS_FILE, sep="\t") 19 | scores = [] 20 | for fpath in audio_clips_list: 21 | if os.path.basename(fpath) not in transcriptions_df['filename'].unique(): 22 | continue 23 | original_audio, fs = sf.read(fpath) 24 | if fs != SUPPORTED_SAMPLING_RATE: 25 | print('Only sampling rate of 16000 is supported as of now so resampling audio') 26 | audio = librosa.core.resample(original_audio, fs, SUPPORTED_SAMPLING_RATE) 27 | sf.write(fpath, audio, SUPPORTED_SAMPLING_RATE) 28 | 29 | try: 30 | with open(fpath, 'rb') as f: 31 | resp = requests.post(WACC_SERVICE_URL, files={'audiodata':f}) 32 | wacc = resp.json() 33 | except: 34 | print('Error occured during scoring') 35 | print('response is ', resp) 36 | sf.write(fpath, original_audio, fs) 37 | score_dict = {'file_name': os.path.basename(fpath), 'wacc': wacc} 38 | scores.append(score_dict) 39 | 40 | df = pd.DataFrame(scores) 41 | print('Mean WAcc for the files is ', np.mean(df['wacc'])) 42 | 43 | if args.score_file: 44 | df.to_csv(args.score_file) 45 | 46 | if __name__=="__main__": 47 | parser = argparse.ArgumentParser() 48 | parser.add_argument("--testset_dir", required=True, 49 | help='Path to the dir containing audio clips to be evaluated') 50 | parser.add_argument('--score_file', help='If you want the scores in a CSV file provide the full path') 51 | 52 | args = parser.parse_args() 53 | main(args) 54 | -------------------------------------------------------------------------------- /audiolib.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | @author: chkarada 4 | """ 5 | import os 6 | import numpy as np 7 | import soundfile as sf 8 | import subprocess 9 | import glob 10 | import librosa 11 | import random 12 | import tempfile 13 | 14 | EPS = np.finfo(float).eps 15 | np.random.seed(0) 16 | 17 | def is_clipped(audio, clipping_threshold=0.99): 18 | return any(abs(audio) > clipping_threshold) 19 | 20 | def normalize(audio, target_level=-25): 21 | '''Normalize the signal to the target level''' 22 | rms = (audio ** 2).mean() ** 0.5 23 | scalar = 10 ** (target_level / 20) / (rms+EPS) 24 | audio = audio * scalar 25 | return audio 26 | 27 | def normalize_segmental_rms(audio, rms, target_level=-25): 28 | '''Normalize the signal to the target level 29 | based on segmental RMS''' 30 | scalar = 10 ** (target_level / 20) / (rms+EPS) 31 | audio = audio * scalar 32 | return audio 33 | 34 | def audioread(path, norm=False, start=0, stop=None, target_level=-25): 35 | '''Function to read audio''' 36 | 37 | path = os.path.abspath(path) 38 | if not os.path.exists(path): 39 | raise ValueError("[{}] does not exist!".format(path)) 40 | try: 41 | audio, sample_rate = sf.read(path, start=start, stop=stop) 42 | except RuntimeError: # fix for sph pcm-embedded shortened v2 43 | print('WARNING: Audio type not supported') 44 | return (None, None) 45 | 46 | if len(audio.shape) == 1: # mono 47 | if norm: 48 | rms = (audio ** 2).mean() ** 0.5 49 | scalar = 10 ** (target_level / 20) / (rms+EPS) 50 | audio = audio * scalar 51 | else: # multi-channel 52 | audio = audio.T 53 | audio = audio.sum(axis=0)/audio.shape[0] 54 | if norm: 55 | audio = normalize(audio, target_level) 56 | 57 | return audio, sample_rate 58 | 59 | 60 | def audiowrite(destpath, audio, sample_rate=16000, norm=False, target_level=-25, \ 61 | clipping_threshold=0.99, clip_test=False): 62 | '''Function to write audio''' 63 | 64 | if clip_test: 65 | if is_clipped(audio, clipping_threshold=clipping_threshold): 66 | raise ValueError("Clipping detected in audiowrite()! " + \ 67 | destpath + " file not written to disk.") 68 | 69 | if norm: 70 | audio = normalize(audio, target_level) 71 | max_amp = max(abs(audio)) 72 | if max_amp >= clipping_threshold: 73 | audio = audio/max_amp * (clipping_threshold-EPS) 74 | 75 | destpath = os.path.abspath(destpath) 76 | destdir = os.path.dirname(destpath) 77 | 78 | if not os.path.exists(destdir): 79 | os.makedirs(destdir) 80 | 81 | sf.write(destpath, audio, sample_rate) 82 | return 83 | 84 | 85 | def add_reverb(sasxExe, input_wav, filter_file, output_wav): 86 | ''' Function to add reverb''' 87 | command_sasx_apply_reverb = "{0} -r {1} \ 88 | -f {2} -o {3}".format(sasxExe, input_wav, filter_file, output_wav) 89 | 90 | subprocess.call(command_sasx_apply_reverb) 91 | return output_wav 92 | 93 | 94 | def add_clipping(audio, max_thresh_perc=0.8): 95 | '''Function to add clipping''' 96 | threshold = max(abs(audio))*max_thresh_perc 97 | audioclipped = np.clip(audio, -threshold, threshold) 98 | return audioclipped 99 | 100 | 101 | def adsp_filter(Adspvqe, nearEndInput, nearEndOutput, farEndInput): 102 | 103 | command_adsp_clean = "{0} --breakOnErrors 0 --sampleRate 16000 --useEchoCancellation 0 \ 104 | --operatingMode 2 --useDigitalAgcNearend 0 --useDigitalAgcFarend 0 \ 105 | --useVirtualAGC 0 --useComfortNoiseGenerator 0 --useAnalogAutomaticGainControl 0 \ 106 | --useNoiseReduction 0 --loopbackInputFile {1} --farEndInputFile {2} \ 107 | --nearEndInputFile {3} --nearEndOutputFile {4}".format(Adspvqe, 108 | farEndInput, farEndInput, nearEndInput, nearEndOutput) 109 | subprocess.call(command_adsp_clean) 110 | 111 | 112 | def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99): 113 | '''Function to mix clean speech and noise at various SNR levels''' 114 | cfg = params['cfg'] 115 | if len(clean) > len(noise): 116 | noise = np.append(noise, np.zeros(len(clean)-len(noise))) 117 | else: 118 | clean = np.append(clean, np.zeros(len(noise)-len(clean))) 119 | 120 | # Normalizing to -25 dB FS 121 | clean = clean/(max(abs(clean))+EPS) 122 | clean = normalize(clean, target_level) 123 | rmsclean = (clean**2).mean()**0.5 124 | 125 | noise = noise/(max(abs(noise))+EPS) 126 | noise = normalize(noise, target_level) 127 | rmsnoise = (noise**2).mean()**0.5 128 | 129 | # Set the noise level for a given SNR 130 | noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS) 131 | noisenewlevel = noise * noisescalar 132 | 133 | # Mix noise and clean speech 134 | noisyspeech = clean + noisenewlevel 135 | 136 | # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value 137 | # There is a chance of clipping that might happen with very less probability, which is not a major issue. 138 | noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper']) 139 | rmsnoisy = (noisyspeech**2).mean()**0.5 140 | scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS) 141 | noisyspeech = noisyspeech * scalarnoisy 142 | clean = clean * scalarnoisy 143 | noisenewlevel = noisenewlevel * scalarnoisy 144 | 145 | # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly 146 | if is_clipped(noisyspeech): 147 | noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS) 148 | noisyspeech = noisyspeech/noisyspeech_maxamplevel 149 | clean = clean/noisyspeech_maxamplevel 150 | noisenewlevel = noisenewlevel/noisyspeech_maxamplevel 151 | noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS))) 152 | 153 | return clean, noisenewlevel, noisyspeech, noisy_rms_level 154 | 155 | 156 | def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99): 157 | '''Function to mix clean speech and noise at various segmental SNR levels''' 158 | cfg = params['cfg'] 159 | if len(clean) > len(noise): 160 | noise = np.append(noise, np.zeros(len(clean)-len(noise))) 161 | else: 162 | clean = np.append(clean, np.zeros(len(noise)-len(clean))) 163 | clean = clean/(max(abs(clean))+EPS) 164 | noise = noise/(max(abs(noise))+EPS) 165 | rmsclean, rmsnoise = active_rms(clean=clean, noise=noise) 166 | clean = normalize_segmental_rms(clean, rms=rmsclean, target_level=target_level) 167 | noise = normalize_segmental_rms(noise, rms=rmsnoise, target_level=target_level) 168 | # Set the noise level for a given SNR 169 | noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS) 170 | noisenewlevel = noise * noisescalar 171 | 172 | # Mix noise and clean speech 173 | noisyspeech = clean + noisenewlevel 174 | # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value 175 | # There is a chance of clipping that might happen with very less probability, which is not a major issue. 176 | noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper']) 177 | rmsnoisy = (noisyspeech**2).mean()**0.5 178 | scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS) 179 | noisyspeech = noisyspeech * scalarnoisy 180 | clean = clean * scalarnoisy 181 | noisenewlevel = noisenewlevel * scalarnoisy 182 | # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly 183 | if is_clipped(noisyspeech): 184 | noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS) 185 | noisyspeech = noisyspeech/noisyspeech_maxamplevel 186 | clean = clean/noisyspeech_maxamplevel 187 | noisenewlevel = noisenewlevel/noisyspeech_maxamplevel 188 | noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS))) 189 | 190 | return clean, noisenewlevel, noisyspeech, noisy_rms_level 191 | 192 | 193 | def active_rms(clean, noise, fs=16000, energy_thresh=-50): 194 | '''Returns the clean and noise RMS of the noise calculated only in the active portions''' 195 | window_size = 100 # in ms 196 | window_samples = int(fs*window_size/1000) 197 | sample_start = 0 198 | noise_active_segs = [] 199 | clean_active_segs = [] 200 | 201 | while sample_start < len(noise): 202 | sample_end = min(sample_start + window_samples, len(noise)) 203 | noise_win = noise[sample_start:sample_end] 204 | clean_win = clean[sample_start:sample_end] 205 | noise_seg_rms = (noise_win**2).mean()**0.5 206 | # Considering frames with energy 207 | if noise_seg_rms > energy_thresh: 208 | noise_active_segs = np.append(noise_active_segs, noise_win) 209 | clean_active_segs = np.append(clean_active_segs, clean_win) 210 | sample_start += window_samples 211 | 212 | if len(noise_active_segs)!=0: 213 | noise_rms = (noise_active_segs**2).mean()**0.5 214 | else: 215 | noise_rms = EPS 216 | 217 | if len(clean_active_segs)!=0: 218 | clean_rms = (clean_active_segs**2).mean()**0.5 219 | else: 220 | clean_rms = EPS 221 | 222 | return clean_rms, noise_rms 223 | 224 | 225 | def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25): 226 | '''Return the percentage of the time the audio signal is above an energy threshold''' 227 | 228 | audio = normalize(audio, target_level) 229 | window_size = 50 # in ms 230 | window_samples = int(fs*window_size/1000) 231 | sample_start = 0 232 | cnt = 0 233 | prev_energy_prob = 0 234 | active_frames = 0 235 | 236 | a = -1 237 | b = 0.2 238 | alpha_rel = 0.05 239 | alpha_att = 0.8 240 | 241 | while sample_start < len(audio): 242 | sample_end = min(sample_start + window_samples, len(audio)) 243 | audio_win = audio[sample_start:sample_end] 244 | frame_rms = 20*np.log10(sum(audio_win**2)+EPS) 245 | frame_energy_prob = 1./(1+np.exp(-(a+b*frame_rms))) 246 | 247 | if frame_energy_prob > prev_energy_prob: 248 | smoothed_energy_prob = frame_energy_prob*alpha_att + prev_energy_prob*(1-alpha_att) 249 | else: 250 | smoothed_energy_prob = frame_energy_prob*alpha_rel + prev_energy_prob*(1-alpha_rel) 251 | 252 | if smoothed_energy_prob > energy_thresh: 253 | active_frames += 1 254 | prev_energy_prob = frame_energy_prob 255 | sample_start += window_samples 256 | cnt += 1 257 | 258 | perc_active = active_frames/cnt 259 | return perc_active 260 | 261 | 262 | def resampler(input_dir, target_sr=16000, ext='*.wav'): 263 | '''Resamples the audio files in input_dir to target_sr''' 264 | files = glob.glob(f"{input_dir}/"+ext) 265 | for pathname in files: 266 | print(pathname) 267 | try: 268 | audio, fs = audioread(pathname) 269 | audio_resampled = librosa.core.resample(audio, fs, target_sr) 270 | audiowrite(pathname, audio_resampled, target_sr) 271 | except: 272 | continue 273 | 274 | 275 | def audio_segmenter(input_dir, dest_dir, segment_len=10, ext='*.wav'): 276 | '''Segments the audio clips in dir to segment_len in secs''' 277 | files = glob.glob(f"{input_dir}/"+ext) 278 | for i in range(len(files)): 279 | audio, fs = audioread(files[i]) 280 | 281 | if len(audio) > (segment_len*fs) and len(audio)%(segment_len*fs) != 0: 282 | audio = np.append(audio, audio[0 : segment_len*fs - (len(audio)%(segment_len*fs))]) 283 | if len(audio) < (segment_len*fs): 284 | while len(audio) < (segment_len*fs): 285 | audio = np.append(audio, audio) 286 | audio = audio[:segment_len*fs] 287 | 288 | num_segments = int(len(audio)/(segment_len*fs)) 289 | audio_segments = np.split(audio, num_segments) 290 | 291 | basefilename = os.path.basename(files[i]) 292 | basename, ext = os.path.splitext(basefilename) 293 | 294 | for j in range(len(audio_segments)): 295 | newname = basename+'_'+str(j)+ext 296 | destpath = os.path.join(dest_dir,newname) 297 | audiowrite(destpath, audio_segments[j], fs) 298 | 299 | -------------------------------------------------------------------------------- /docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf -------------------------------------------------------------------------------- /docs/ICASSP_2021_DNS_challenge.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2021_DNS_challenge.pdf -------------------------------------------------------------------------------- /docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf -------------------------------------------------------------------------------- /download-dns-challenge-1.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Datasets for INTERSPEECH 2020 DNS Challenge 1 ***** 4 | 5 | # NOTE: This data is for the *PAST* challenge! 6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which 7 | # has its own download script, `download-dns-challenge-4.sh` 8 | 9 | ############################################################### 10 | 11 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive" 12 | 13 | mkdir -p ./datasets/ 14 | 15 | URL="$AZURE_URL/datasets-interspeech2020.tar.bz2" 16 | echo "Download: $BLOB" 17 | 18 | # DRY RUN: print HTTP header WITHOUT downloading the files 19 | curl -s -I "$URL" 20 | 21 | # Actually download the archive - UNCOMMENT it when ready to download 22 | # curl "$URL" -o "$BLOB" 23 | 24 | # Same as above, but using wget 25 | # wget "$URL" -O "$BLOB" 26 | 27 | # Same, + unpack files on the fly 28 | # curl "$URL" | tar -f - -x -j 29 | -------------------------------------------------------------------------------- /download-dns-challenge-2.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Datasets for ICASSP 2021 DNS Challenge 2 ***** 4 | 5 | # NOTE: This data is for the *PAST* challenge! 6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which 7 | # has its own download script, `download-dns-challenge-4.sh` 8 | 9 | # NOTE: Before downloading, make sure you have enough space 10 | # on your local storage! 11 | 12 | # In all, you will need at least 230GB to store UNPACKED data. 13 | # Archived, the same data takes 155GB total. 14 | 15 | # Please comment out the files you don't need before launching 16 | # the script. 17 | 18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 19 | # Please scroll down and edit this script to pick the 20 | # downloading method that works best for you. 21 | 22 | # ------------------------------------------------------------- 23 | # The directory structure of the unpacked data is: 24 | 25 | # datasets 229G 26 | # +-- clean 204G 27 | # | +-- emotional_speech 403M 28 | # | +-- french_data 21G 29 | # | +-- german_speech 66G 30 | # | +-- italian_speech 14G 31 | # | +-- mandarin_speech 21G 32 | # | +-- read_speech 61G 33 | # | +-- russian_speech 5.1G 34 | # | +-- singing_voice 979M 35 | # | \-- spanish_speech 17G 36 | # +-- dev_testset 211M 37 | # +-- impulse_responses 4.3G 38 | # | +-- SLR26 2.1G 39 | # | \-- SLR28 2.3G 40 | # \-- noise 20G 41 | 42 | BLOB_NAMES=( 43 | 44 | # DEMAND dataset 45 | DEMAND.tar.bz2 46 | 47 | # Wideband clean speech 48 | datasets/datasets.clean.read_speech.tar.bz2 49 | 50 | # Wideband emotional speech 51 | datasets/datasets.clean.emotional_speech.tar.bz2 52 | 53 | # Wideband non-English clean speech 54 | datasets/datasets.clean.french_data.tar.bz2 55 | datasets/datasets.clean.german_speech.tar.bz2 56 | datasets/datasets.clean.italian_speech.tar.bz2 57 | datasets/datasets.clean.mandarin_speech.tar.bz2 58 | datasets/datasets.clean.russian_speech.tar.bz2 59 | datasets/datasets.clean.singing_voice.tar.bz2 60 | datasets/datasets.clean.spanish_speech.tar.bz2 61 | 62 | # Wideband noise, IR, and test data 63 | datasets/datasets.impulse_responses.tar.bz2 64 | datasets/datasets.noise.tar.bz2 65 | datasets/datasets.dev_testset.tar.bz2 66 | ) 67 | 68 | ############################################################### 69 | 70 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive" 71 | 72 | mkdir -p ./datasets 73 | 74 | for BLOB in ${BLOB_NAMES[@]} 75 | do 76 | URL="$AZURE_URL/$BLOB" 77 | echo "Download: $BLOB" 78 | 79 | # DRY RUN: print HTTP headers WITHOUT downloading the files 80 | curl -s -I "$URL" | head -n 1 81 | 82 | # Actually download the files - UNCOMMENT it when ready to download 83 | # curl "$URL" -o "$BLOB" 84 | 85 | # Same as above, but using wget 86 | # wget "$URL" -O "$BLOB" 87 | 88 | # Same, + unpack files on the fly 89 | # curl "$URL" | tar -f - -x -j 90 | done 91 | -------------------------------------------------------------------------------- /download-dns-challenge-3.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Datasets for INTERSPEECH 2021 DNS Challenge 3 ***** 4 | 5 | # NOTE: This data is for the *PAST* challenge! 6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which 7 | # has its own download script, `download-dns-challenge-4.sh` 8 | 9 | # NOTE: Before downloading, make sure you have enough space 10 | # on your local storage! 11 | 12 | # In all, you will need at least 830GB to store UNPACKED data. 13 | # Archived, the same data takes 512GB total. 14 | 15 | # Please comment out the files you don't need before launching 16 | # the script. 17 | 18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 19 | # Please scroll down and edit this script to pick the 20 | # downloading method that works best for you. 21 | 22 | # ------------------------------------------------------------- 23 | # The directory structure of the unpacked data is: 24 | 25 | # *** Wideband data: *** 26 | 27 | # datasets 229G 28 | # +-- clean 204G 29 | # | +-- emotional_speech 403M 30 | # | +-- french_data 21G 31 | # | +-- german_speech 66G 32 | # | +-- italian_speech 14G 33 | # | +-- mandarin_speech 21G 34 | # | +-- read_speech 61G 35 | # | +-- russian_speech 5.1G 36 | # | +-- singing_voice 979M 37 | # | \-- spanish_speech 17G 38 | # +-- dev_testset 211M 39 | # +-- impulse_responses 4.3G 40 | # | +-- SLR26 2.1G 41 | # | \-- SLR28 2.3G 42 | # \-- noise 20G 43 | 44 | # *** Fullband data: *** 45 | 46 | # datasets_fullband 600G 47 | # +-- clean_fullband 542G 48 | # | +-- VocalSet_48kHz_mono 974M 49 | # | +-- emotional_speech 1.2G 50 | # | +-- french_data 62G 51 | # | +-- german_speech 194G 52 | # | +-- italian_speech 42G 53 | # | +-- read_speech 182G 54 | # | +-- russian_speech 12G 55 | # | \-- spanish_speech 50G 56 | # +-- dev_testset_fullband 630M 57 | # \-- noise_fullband 58G 58 | 59 | BLOB_NAMES=( 60 | 61 | # DEMAND dataset 62 | DEMAND.tar.bz2 63 | 64 | # Wideband clean speech 65 | datasets/datasets.clean.read_speech.tar.bz2 66 | 67 | # Wideband emotional speech 68 | datasets/datasets.clean.emotional_speech.tar.bz2 69 | 70 | # Wideband non-English clean speech 71 | datasets/datasets.clean.french_data.tar.bz2 72 | datasets/datasets.clean.german_speech.tar.bz2 73 | datasets/datasets.clean.italian_speech.tar.bz2 74 | datasets/datasets.clean.mandarin_speech.tar.bz2 75 | datasets/datasets.clean.russian_speech.tar.bz2 76 | datasets/datasets.clean.singing_voice.tar.bz2 77 | datasets/datasets.clean.spanish_speech.tar.bz2 78 | 79 | # Wideband noise, IR, and test data 80 | datasets/datasets.impulse_responses.tar.bz2 81 | datasets/datasets.noise.tar.bz2 82 | datasets/datasets.dev_testset.tar.bz2 83 | 84 | # --------------------------------------------------------- 85 | 86 | # Fullband clean speech 87 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.0.tar.bz2 88 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.1.tar.bz2 89 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.2.tar.bz2 90 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.3.tar.bz2 91 | datasets_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono.tar.bz2 92 | 93 | # Fullband emotional speech 94 | datasets_fullband/datasets_fullband.clean_fullband.emotional_speech.tar.bz2 95 | 96 | # Fullband non-English clean speech 97 | datasets_fullband/datasets_fullband.clean_fullband.french_data.tar.bz2 98 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.0.tar.bz2 99 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.1.tar.bz2 100 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.2.tar.bz2 101 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.3.tar.bz2 102 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.4.tar.bz2 103 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.5.tar.bz2 104 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.6.tar.bz2 105 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.7.tar.bz2 106 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.8.tar.bz2 107 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.9.tar.bz2 108 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.10.tar.bz2 109 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.11.tar.bz2 110 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.12.tar.bz2 111 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.13.tar.bz2 112 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.14.tar.bz2 113 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.15.tar.bz2 114 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.16.tar.bz2 115 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.17.tar.bz2 116 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.18.tar.bz2 117 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.19.tar.bz2 118 | datasets_fullband/datasets_fullband.clean_fullband.italian_speech.tar.bz2 119 | datasets_fullband/datasets_fullband.clean_fullband.russian_speech.tar.bz2 120 | datasets_fullband/datasets_fullband.clean_fullband.spanish_speech.tar.bz2 121 | 122 | # Fullband noise and test data 123 | datasets_fullband/datasets_fullband.noise_fullband.tar.bz2 124 | datasets_fullband/datasets_fullband.dev_testset_fullband.tar.bz2 125 | ) 126 | 127 | ############################################################### 128 | 129 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive" 130 | 131 | mkdir -p ./datasets ./datasets_fullband 132 | 133 | for BLOB in ${BLOB_NAMES[@]} 134 | do 135 | URL="$AZURE_URL/$BLOB" 136 | echo "Download: $BLOB" 137 | 138 | # DRY RUN: print HTTP headers WITHOUT downloading the files 139 | curl -s -I "$URL" | head -n 1 140 | 141 | # Actually download the files - UNCOMMENT it when ready to download 142 | # curl "$URL" -o "$BLOB" 143 | 144 | # Same as above, but using wget 145 | # wget "$URL" -O "$BLOB" 146 | 147 | # Same, + unpack files on the fly 148 | # curl "$URL" | tar -f - -x -j 149 | done 150 | -------------------------------------------------------------------------------- /download-dns-challenge-4-pdns.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Personalized DNS Track ***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # In all, you will need about 380TB to store the UNPACKED data. 9 | # Archived, the same data takes about 200GB total. 10 | 11 | # Please comment out the files you don't need before launching 12 | # the script. 13 | 14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 15 | # Please scroll down and edit this script to pick the 16 | # downloading method that works best for you. 17 | 18 | # ------------------------------------------------------------- 19 | # The directory structure of the unpacked data is: 20 | 21 | # . 362G 22 | # +-- datasets_fullband 64G 23 | # | +-- impulse_responses 5.9G 24 | # | \-- noise_fullband 58G 25 | # +-- pdns_training_set 294G 26 | # | +-- enrollment_embeddings 115M 27 | # | +-- enrollment_wav 42G 28 | # | +-- raw/clean 252G 29 | # | +-- english 168G 30 | # | +-- french 2.1G 31 | # | +-- german 53G 32 | # | +-- italian 17G 33 | # | +-- russian 6.8G 34 | # | \-- spanish 5.4G 35 | # \-- personalized_dev_testset 3.3G 36 | 37 | BLOB_NAMES=( 38 | 39 | pdns_training_set/raw/pdns_training_set.raw.clean.english_000.tar.bz2 40 | pdns_training_set/raw/pdns_training_set.raw.clean.english_001.tar.bz2 41 | pdns_training_set/raw/pdns_training_set.raw.clean.english_002.tar.bz2 42 | pdns_training_set/raw/pdns_training_set.raw.clean.english_003.tar.bz2 43 | pdns_training_set/raw/pdns_training_set.raw.clean.english_004.tar.bz2 44 | pdns_training_set/raw/pdns_training_set.raw.clean.english_005.tar.bz2 45 | pdns_training_set/raw/pdns_training_set.raw.clean.english_006.tar.bz2 46 | pdns_training_set/raw/pdns_training_set.raw.clean.english_007.tar.bz2 47 | pdns_training_set/raw/pdns_training_set.raw.clean.english_008.tar.bz2 48 | pdns_training_set/raw/pdns_training_set.raw.clean.english_009.tar.bz2 49 | pdns_training_set/raw/pdns_training_set.raw.clean.english_010.tar.bz2 50 | pdns_training_set/raw/pdns_training_set.raw.clean.english_011.tar.bz2 51 | pdns_training_set/raw/pdns_training_set.raw.clean.english_012.tar.bz2 52 | pdns_training_set/raw/pdns_training_set.raw.clean.english_013.tar.bz2 53 | pdns_training_set/raw/pdns_training_set.raw.clean.english_014.tar.bz2 54 | pdns_training_set/raw/pdns_training_set.raw.clean.english_015.tar.bz2 55 | pdns_training_set/raw/pdns_training_set.raw.clean.english_016.tar.bz2 56 | pdns_training_set/raw/pdns_training_set.raw.clean.english_017.tar.bz2 57 | pdns_training_set/raw/pdns_training_set.raw.clean.english_018.tar.bz2 58 | pdns_training_set/raw/pdns_training_set.raw.clean.english_019.tar.bz2 59 | pdns_training_set/raw/pdns_training_set.raw.clean.english_020.tar.bz2 60 | pdns_training_set/raw/pdns_training_set.raw.clean.french_000.tar.bz2 61 | pdns_training_set/raw/pdns_training_set.raw.clean.german_000.tar.bz2 62 | pdns_training_set/raw/pdns_training_set.raw.clean.german_001.tar.bz2 63 | pdns_training_set/raw/pdns_training_set.raw.clean.german_002.tar.bz2 64 | pdns_training_set/raw/pdns_training_set.raw.clean.german_003.tar.bz2 65 | pdns_training_set/raw/pdns_training_set.raw.clean.german_004.tar.bz2 66 | pdns_training_set/raw/pdns_training_set.raw.clean.german_005.tar.bz2 67 | pdns_training_set/raw/pdns_training_set.raw.clean.german_006.tar.bz2 68 | pdns_training_set/raw/pdns_training_set.raw.clean.german_007.tar.bz2 69 | pdns_training_set/raw/pdns_training_set.raw.clean.german_008.tar.bz2 70 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_000.tar.bz2 71 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_001.tar.bz2 72 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_002.tar.bz2 73 | pdns_training_set/raw/pdns_training_set.raw.clean.russian_000.tar.bz2 74 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_000.tar.bz2 75 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_001.tar.bz2 76 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_002.tar.bz2 77 | 78 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_000.tar.bz2 79 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_001.tar.bz2 80 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_002.tar.bz2 81 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_003.tar.bz2 82 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_004.tar.bz2 83 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.french_000.tar.bz2 84 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_000.tar.bz2 85 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_001.tar.bz2 86 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.italian_000.tar.bz2 87 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.russian_000.tar.bz2 88 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.spanish_000.tar.bz2 89 | 90 | pdns_training_set/pdns_training_set.enrollment_embeddings_000.tar.bz2 91 | 92 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2 93 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2 94 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2 95 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2 96 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2 97 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2 98 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2 99 | 100 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2 101 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2 102 | 103 | datasets_fullband/datasets_fullband.impulse_responses_000.tar.bz2 104 | 105 | personalized_dev_testset/personalized_dev_testset.enrollment.tar.bz2 106 | personalized_dev_testset/personalized_dev_testset.noisy_testclips.tar.bz2 107 | ) 108 | 109 | ############################################################### 110 | 111 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive" 112 | 113 | OUTPUT_PATH="." 114 | 115 | mkdir -p $OUTPUT_PATH/{pdns_training_set/{raw,enrollment_wav},datasets_fullband/noise_fullband} 116 | 117 | for BLOB in ${BLOB_NAMES[@]} 118 | do 119 | URL="$AZURE_URL/$BLOB" 120 | echo "Download: $BLOB" 121 | 122 | # DRY RUN: print HTTP response and Content-Length 123 | # WITHOUT downloading the files 124 | curl -s -I "$URL" | head -n 2 125 | 126 | # Actually download the files: UNCOMMENT when ready to download 127 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB" 128 | 129 | # Same as above, but using wget 130 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB" 131 | 132 | # Same, + unpack files on the fly 133 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j 134 | done 135 | -------------------------------------------------------------------------------- /download-dns-challenge-4.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Main (Real-Time) Track ***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # In all, you will need about 1TB to store the UNPACKED data. 9 | # Archived, the same data takes about 550GB total. 10 | 11 | # Please comment out the files you don't need before launching 12 | # the script. 13 | 14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 15 | # Please scroll down and edit this script to pick the 16 | # downloading method that works best for you. 17 | 18 | # ------------------------------------------------------------- 19 | # The directory structure of the unpacked data is: 20 | 21 | # datasets_fullband 892G 22 | # +-- dev_testset 1.7G 23 | # +-- impulse_responses 5.9G 24 | # +-- noise_fullband 58G 25 | # \-- clean_fullband 827G 26 | # +-- emotional_speech 2.4G 27 | # +-- french_speech 62G 28 | # +-- german_speech 319G 29 | # +-- italian_speech 42G 30 | # +-- read_speech 299G 31 | # +-- russian_speech 12G 32 | # +-- spanish_speech 65G 33 | # +-- vctk_wav48_silence_trimmed 27G 34 | # \-- VocalSet_48kHz_mono 974M 35 | 36 | BLOB_NAMES=( 37 | 38 | clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2 39 | 40 | clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2 41 | 42 | clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2 43 | clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2 44 | clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2 45 | clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2 46 | clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2 47 | clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2 48 | clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2 49 | clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2 50 | clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2 51 | 52 | clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2 53 | clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2 54 | clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2 55 | clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2 56 | clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2 57 | clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2 58 | clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2 59 | clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2 60 | clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2 61 | clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2 62 | clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2 63 | clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2 64 | clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2 65 | clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2 66 | clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2 67 | clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2 68 | clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2 69 | clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2 70 | clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2 71 | clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2 72 | clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2 73 | clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2 74 | clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2 75 | clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2 76 | clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2 77 | clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2 78 | clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2 79 | clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2 80 | clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2 81 | clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2 82 | clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2 83 | clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2 84 | clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2 85 | clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2 86 | clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2 87 | clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2 88 | clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2 89 | clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2 90 | clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2 91 | clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2 92 | clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2 93 | clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2 94 | clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2 95 | 96 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2 97 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2 98 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2 99 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2 100 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2 101 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2 102 | 103 | clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2 104 | clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2 105 | clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2 106 | clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2 107 | clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2 108 | clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2 109 | clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2 110 | clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2 111 | clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2 112 | clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2 113 | clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2 114 | clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2 115 | clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2 116 | clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2 117 | clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2 118 | clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2 119 | clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2 120 | clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2 121 | clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2 122 | clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2 123 | clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2 124 | clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2 125 | clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2 126 | clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2 127 | clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2 128 | clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2 129 | clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2 130 | clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2 131 | clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2 132 | clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2 133 | clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2 134 | clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2 135 | clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2 136 | clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2 137 | clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2 138 | clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2 139 | clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2 140 | clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2 141 | clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2 142 | clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2 143 | 144 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2 145 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2 146 | 147 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2 148 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2 149 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2 150 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2 151 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2 152 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2 153 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2 154 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2 155 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2 156 | 157 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2 158 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2 159 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2 160 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2 161 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2 162 | 163 | noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2 164 | noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2 165 | noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2 166 | noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2 167 | noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2 168 | noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2 169 | noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2 170 | 171 | noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2 172 | noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2 173 | 174 | datasets_fullband.dev_testset_000.tar.bz2 175 | 176 | datasets_fullband.impulse_responses_000.tar.bz2 177 | ) 178 | 179 | ############################################################### 180 | 181 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband" 182 | 183 | OUTPUT_PATH="./datasets_fullband" 184 | 185 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband} 186 | 187 | for BLOB in ${BLOB_NAMES[@]} 188 | do 189 | URL="$AZURE_URL/$BLOB" 190 | echo "Download: $BLOB" 191 | 192 | # DRY RUN: print HTTP response and Content-Length 193 | # WITHOUT downloading the files 194 | curl -s -I "$URL" | head -n 2 195 | 196 | # Actually download the files: UNCOMMENT when ready to download 197 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB" 198 | 199 | # Same as above, but using wget 200 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB" 201 | 202 | # Same, + unpack files on the fly 203 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j 204 | done 205 | -------------------------------------------------------------------------------- /download-dns-challenge-5-baseline.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # Zip file is 1.4 GB. 9 | # ------------------------------------------------------------- 10 | 11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip" 12 | echo "Download: $URL" 13 | # 14 | # DRY RUN: print HTTP header WITHOUT downloading the files 15 | curl -s -I "$URL" 16 | # 17 | # Actually download the archive - UNCOMMENT it when ready to download 18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip --output 'Baseline.zip' 19 | #wget --no-check-certificate "$URL" 20 | -------------------------------------------------------------------------------- /download-dns-challenge-5-filelists-headset.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # Zip file is 1.5MB. 9 | # It contains speaker ID filsists for headset training clean speech (Track 1) 10 | # ------------------------------------------------------------- 11 | 12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip" 13 | echo "Download: $URL" 14 | # 15 | # DRY RUN: print HTTP header WITHOUT downloading the files 16 | curl -s -I "$URL" 17 | # 18 | # Actually download the archive - UNCOMMENT it when ready to download 19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip --output 'filelists_headset.zip' 20 | #wget --no-check-certificate "$URL" 21 | -------------------------------------------------------------------------------- /download-dns-challenge-5-filelists-speakerphone.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # Zip file is 1.5MB. 9 | # It contains speaker ID filsists for speakerphone training clean speech (Track 2) 10 | # ------------------------------------------------------------- 11 | 12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip" 13 | echo "Download: $URL" 14 | # 15 | # DRY RUN: print HTTP header WITHOUT downloading the files 16 | curl -s -I "$URL" 17 | # 18 | # Actually download the archive - UNCOMMENT it when ready to download 19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip --output 'filelists_speakerphone.zip' 20 | #wget --no-check-certificate "$URL" 21 | -------------------------------------------------------------------------------- /download-dns-challenge-5-headset-training.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** 5th DNS Challenge at ICASSP 2023***** 4 | # Track 1 Headset Clean speech: All Languages 5 | # ------------------------------------------------------------- 6 | # In all, you will need about 1TB to store the UNPACKED data. 7 | # Archived, the same data takes about 550GB total. 8 | 9 | # Please comment out the files you don't need before launching 10 | # the script. 11 | 12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 13 | # Please scroll down and edit this script to pick the 14 | # downloading method that works best for you. 15 | 16 | # ------------------------------------------------------------- 17 | # The directory structure of the unpacked data is: 18 | 19 | # datasets_fullband 20 | # \-- clean_fullband 827G 21 | # +-- emotional_speech 2.4G 22 | # +-- french_speech 62G 23 | # +-- german_speech 319G 24 | # +-- italian_speech 42G 25 | # +-- read_speech 299G 26 | # +-- russian_speech 12G 27 | # +-- spanish_speech 65G 28 | # +-- vctk_wav48_silence_trimmed 27G 29 | # \-- VocalSet_48kHz_mono 974M 30 | 31 | BLOB_NAMES=( 32 | 33 | Track1_Headset/VocalSet_48kHz_mono.tgz 34 | Track1_Headset/emotional_speech.tgz 35 | 36 | Track1_Headset/french_speech.tar.gz.partaa 37 | Track1_Headset/french_speech.tar.gz.partab 38 | Track1_Headset/french_speech.tar.gz.partac 39 | Track1_Headset/french_speech.tar.gz.partad 40 | Track1_Headset/french_speech.tar.gz.partae 41 | Track1_Headset/french_speech.tar.gz.partah 42 | 43 | Track1_Headset/german_speech.tgz.partaa 44 | Track1_Headset/german_speech.tgz.partab 45 | Track1_Headset/german_speech.tgz.partac 46 | Track1_Headset/german_speech.tgz.partad 47 | Track1_Headset/german_speech.tgz.partae 48 | Track1_Headset/german_speech.tgz.partaf 49 | Track1_Headset/german_speech.tgz.partag 50 | Track1_Headset/german_speech.tgz.partah 51 | Track1_Headset/german_speech.tgz.partaj 52 | Track1_Headset/german_speech.tgz.partal 53 | Track1_Headset/german_speech.tgz.partam 54 | Track1_Headset/german_speech.tgz.partan 55 | Track1_Headset/german_speech.tgz.partao 56 | Track1_Headset/german_speech.tgz.partap 57 | Track1_Headset/german_speech.tgz.partaq 58 | Track1_Headset/german_speech.tgz.partar 59 | Track1_Headset/german_speech.tgz.partas 60 | Track1_Headset/german_speech.tgz.partat 61 | Track1_Headset/german_speech.tgz.partau 62 | Track1_Headset/german_speech.tgz.partav 63 | Track1_Headset/german_speech.tgz.partaw 64 | 65 | Track1_Headset/italian_speech.tgz.partaa 66 | Track1_Headset/italian_speech.tgz.partab 67 | Track1_Headset/italian_speech.tgz.partac 68 | Track1_Headset/italian_speech.tgz.partad 69 | 70 | Track1_Headset/read_speech.tgz.partaa 71 | Track1_Headset/read_speech.tgz.partab 72 | Track1_Headset/read_speech.tgz.partac 73 | Track1_Headset/read_speech.tgz.partad 74 | Track1_Headset/read_speech.tgz.partae 75 | Track1_Headset/read_speech.tgz.partaf 76 | Track1_Headset/read_speech.tgz.partag 77 | Track1_Headset/read_speech.tgz.partah 78 | Track1_Headset/read_speech.tgz.partai 79 | Track1_Headset/read_speech.tgz.partaj 80 | Track1_Headset/read_speech.tgz.partak 81 | Track1_Headset/read_speech.tgz.partal 82 | Track1_Headset/read_speech.tgz.partam 83 | Track1_Headset/read_speech.tgz.partan 84 | Track1_Headset/read_speech.tgz.partao 85 | Track1_Headset/read_speech.tgz.partap 86 | Track1_Headset/read_speech.tgz.partaq 87 | Track1_Headset/read_speech.tgz.partar 88 | Track1_Headset/read_speech.tgz.partas 89 | Track1_Headset/read_speech.tgz.partat 90 | Track1_Headset/read_speech.tgz.partau 91 | 92 | Track1_Headset/russian_speech.tgz 93 | 94 | Track1_Headset/spanish_speech.tgz.partaa 95 | Track1_Headset/spanish_speech.tgz.partab 96 | Track1_Headset/spanish_speech.tgz.partac 97 | Track1_Headset/spanish_speech.tgz.partad 98 | Track1_Headset/spanish_speech.tgz.partae 99 | Track1_Headset/spanish_speech.tgz.partaf 100 | Track1_Headset/spanish_speech.tgz.partag 101 | 102 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partaa 103 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partab 104 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partac 105 | ) 106 | 107 | ############################################################### 108 | # this data is extracted from datasets used in Track 2. 109 | 110 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset" 111 | 112 | OUTPUT_PATH="./datasets_fullband" 113 | 114 | mkdir -p $OUTPUT_PATH/{clean_fullband} 115 | 116 | for BLOB in ${BLOB_NAMES[@]} 117 | do 118 | URL="$AZURE_URL/$BLOB" 119 | echo "Download: $BLOB" 120 | 121 | # DRY RUN: print HTTP response and Content-Length 122 | # WITHOUT downloading the files 123 | curl -s -I "$URL" | head -n 2 124 | 125 | # Actually download the files: UNCOMMENT when ready to download 126 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB" 127 | 128 | # Same as above, but using wget 129 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB" 130 | 131 | # Same, + unpack files on the fly 132 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j 133 | done 134 | -------------------------------------------------------------------------------- /download-dns-challenge-5-noise-ir.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** 5th DNS Challenge at ICASSP 2023***** 4 | # Noise data which is used in both tracks 5 | # Also download the impulse response data 6 | 7 | # All compressed noises files are ~39 GB 8 | # ------------------------------------------------------------- 9 | # ------------------------------------------------------------- 10 | # The directory structure of the unpacked data is: 11 | # +-- noise_fullband 12 | 13 | BLOB_NAMES=( 14 | noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2 15 | noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2 16 | noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2 17 | noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2 18 | noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2 19 | noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2 20 | noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2 21 | 22 | noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2 23 | noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2 24 | 25 | datasets_fullband.impulse_responses_000.tar.bz2 26 | ) 27 | 28 | ############################################################### 29 | 30 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset" 31 | 32 | OUTPUT_PATH="./" 33 | 34 | mkdir -p $OUTPUT_PATH/{noise_fullband} 35 | 36 | for BLOB in ${BLOB_NAMES[@]} 37 | do 38 | URL="$AZURE_URL/$BLOB" 39 | echo "Download: $BLOB" 40 | 41 | # DRY RUN: print HTTP response and Content-Length 42 | # WITHOUT downloading the files 43 | curl -s -I "$URL" | head -n 2 44 | 45 | # Actually download the files: UNCOMMENT when ready to download 46 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB" 47 | 48 | # Same as above, but using wget 49 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB" 50 | 51 | # Same, + unpack files on the fly 52 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j 53 | done 54 | -------------------------------------------------------------------------------- /download-dns-challenge-5-paralinguistic-train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # Zip file is 181.8 MB. 9 | # ------------------------------------------------------------- 10 | 11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip" 12 | echo "Download: $URL" 13 | # 14 | # DRY RUN: print HTTP header WITHOUT downloading the files 15 | curl -s -I "$URL" 16 | # 17 | # Actually download the archive - UNCOMMENT it when ready to download 18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip --output 'paralinguistic_training.zip' 19 | #wget --no-check-certificate "$URL" 20 | -------------------------------------------------------------------------------- /download-dns-challenge-5-speakerphone-training.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** 5th DNS Challenge at ICASSP 2023***** 4 | # Track 2 Speakerphone Clean speech: All Languages 5 | # ------------------------------------------------------------- 6 | # In all, you will need about 1TB to store the UNPACKED data. 7 | # Archived, the same data takes about 550GB total. 8 | 9 | # Please comment out the files you don't need before launching 10 | # the script. 11 | 12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES! 13 | # Please scroll down and edit this script to pick the 14 | # downloading method that works best for you. 15 | 16 | # ------------------------------------------------------------- 17 | # The directory structure of the unpacked data is: 18 | 19 | # datasets_fullband 20 | # \-- clean_fullband 827G 21 | # +-- emotional_speech 2.4G 22 | # +-- french_speech 62G 23 | # +-- german_speech 319G 24 | # +-- italian_speech 42G 25 | # +-- read_speech 299G 26 | # +-- russian_speech 12G 27 | # +-- spanish_speech 65G 28 | # +-- vctk_wav48_silence_trimmed 27G 29 | # \-- VocalSet_48kHz_mono 974M 30 | 31 | BLOB_NAMES=( 32 | 33 | clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2 34 | 35 | clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2 36 | 37 | clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2 38 | clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2 39 | clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2 40 | clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2 41 | clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2 42 | clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2 43 | clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2 44 | clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2 45 | clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2 46 | 47 | clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2 48 | clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2 49 | clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2 50 | clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2 51 | clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2 52 | clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2 53 | clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2 54 | clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2 55 | clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2 56 | clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2 57 | clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2 58 | clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2 59 | clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2 60 | clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2 61 | clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2 62 | clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2 63 | clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2 64 | clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2 65 | clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2 66 | clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2 67 | clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2 68 | clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2 69 | clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2 70 | clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2 71 | clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2 72 | clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2 73 | clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2 74 | clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2 75 | clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2 76 | clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2 77 | clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2 78 | clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2 79 | clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2 80 | clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2 81 | clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2 82 | clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2 83 | clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2 84 | clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2 85 | clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2 86 | clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2 87 | clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2 88 | clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2 89 | clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2 90 | 91 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2 92 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2 93 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2 94 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2 95 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2 96 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2 97 | 98 | clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2 99 | clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2 100 | clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2 101 | clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2 102 | clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2 103 | clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2 104 | clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2 105 | clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2 106 | clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2 107 | clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2 108 | clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2 109 | clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2 110 | clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2 111 | clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2 112 | clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2 113 | clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2 114 | clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2 115 | clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2 116 | clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2 117 | clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2 118 | clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2 119 | clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2 120 | clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2 121 | clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2 122 | clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2 123 | clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2 124 | clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2 125 | clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2 126 | clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2 127 | clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2 128 | clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2 129 | clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2 130 | clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2 131 | clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2 132 | clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2 133 | clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2 134 | clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2 135 | clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2 136 | clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2 137 | clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2 138 | 139 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2 140 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2 141 | 142 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2 143 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2 144 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2 145 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2 146 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2 147 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2 148 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2 149 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2 150 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2 151 | 152 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2 153 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2 154 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2 155 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2 156 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2 157 | 158 | ) 159 | 160 | ############################################################### 161 | # this data is identical to non-personalized track 4th DNS Challenge clean speech 162 | # recommend to re-download the data using this script 163 | 164 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband" 165 | 166 | OUTPUT_PATH="./datasets_fullband" 167 | 168 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband} 169 | 170 | for BLOB in ${BLOB_NAMES[@]} 171 | do 172 | URL="$AZURE_URL/$BLOB" 173 | echo "Download: $BLOB" 174 | 175 | # DRY RUN: print HTTP response and Content-Length 176 | # WITHOUT downloading the files 177 | curl -s -I "$URL" | head -n 2 178 | 179 | # Actually download the files: UNCOMMENT when ready to download 180 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB" 181 | 182 | # Same as above, but using wget 183 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB" 184 | 185 | # Same, + unpack files on the fly 186 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j 187 | done 188 | -------------------------------------------------------------------------------- /download-dns5-blind-testset.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** BLIND Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # ------------------------------------------------------------- 9 | # The directory structure of the unpacked data is: 10 | 11 | # 12 | # +-- V5_BlindTestSet 13 | # | +-- Track1_Headset ---> (enrol, noisy) 14 | # | +-- Track2_Speakerphone ---> (enrol, noisy) 15 | 16 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_BlindTestSet.zip" 17 | 18 | echo "Download: $URL" 19 | # 20 | # DRY RUN: print HTTP header WITHOUT downloading the files 21 | curl -s -I "$URL" 22 | # 23 | # Actually download the archive - UNCOMMENT it when ready to download 24 | #do 25 | wget "$URL" 26 | 27 | #done 28 | # curl "$URL" -o "$BLOB" 29 | # Same as above, but using wget 30 | #wget "$URL 31 | # Same, + unpack files on the fly 32 | # curl "$URL" | tar -f - -x -j 33 | -------------------------------------------------------------------------------- /download-dns5-dev-testset.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # Zip file is 2.9 GB. Unzipped data is 4GB. 9 | 10 | # ------------------------------------------------------------- 11 | # The directory structure of the unpacked data is: 12 | 13 | # 14 | # +-- V5_dev_testset 64G 15 | # | +-- Track1_Headset ---> (enrol, noisy) 16 | # | +-- Track2_Speakerphone ---> (enrol, noisy) 17 | 18 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_dev_testset.zip" 19 | echo "Download: $URL" 20 | # 21 | # DRY RUN: print HTTP header WITHOUT downloading the files 22 | curl -s -I "$URL" 23 | # 24 | # Actually download the archive - UNCOMMENT it when ready to download 25 | #do 26 | wget "$URL" 27 | 28 | #done 29 | # curl "$URL" -o "$BLOB" 30 | # Same as above, but using wget 31 | #wget "$URL 32 | # Same, + unpack files on the fly 33 | # curl "$URL" | tar -f - -x -j 34 | -------------------------------------------------------------------------------- /download_dns_v2_v3_blindset.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/bash 2 | 3 | # ***** BLIND Testset for 2nd and 3rd DNS Challenges combined with additional handpicked clips***** 4 | 5 | # NOTE: Before downloading, make sure you have enough space 6 | # on your local storage! 7 | 8 | # ------------------------------------------------------------- 9 | # The directory structure of the unpacked data is: 10 | 11 | # 12 | # +-- V2_V3_Challenge_Combined_Blindset 13 | # | +-- handpicked_emotion_testclips_16k_600_withSNR ---> (600 emotional clips) 14 | # | +-- mouseclicks_testclips_withSNR_16k ---> (100 mouseclicks clips) 15 | # | +-- noisy_blind_testset_v2_challenge_withSNR_16k ---> (700 blindset clips from V2 challenge) 16 | # | +-- noisy_blind_testset_v3_challenge_withSNR_16k ---> (600 blindset clips from V3 challenge) 17 | 18 | URL="https://dnschallengepublic.blob.core.windows.net/dns3archive/V2_V3_Challenge_Combined_Blindset.zip" 19 | 20 | echo "Download: $URL" 21 | # 22 | # DRY RUN: print HTTP header WITHOUT downloading the files 23 | curl -s -I "$URL" 24 | # 25 | # Actually download the archive - UNCOMMENT it when ready to download 26 | #do 27 | wget "$URL" 28 | 29 | #done 30 | # curl "$URL" -o "$BLOB" 31 | # Same as above, but using wget 32 | #wget "$URL 33 | # Same, + unpack files on the fly 34 | # curl "$URL" | tar -f - -x -j 35 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Model comparison 4 | 5 | 6 | 7 | 8 | 9 | 72 | 88 | 95 |
96 |

Audio Clips

97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 |
Index
Progress
25%
Clipname
105 | 106 |
107 | 108 | 109 | 110 | 111 |
112 | 113 |
114 | 115 | 242 | 243 | 244 | -------------------------------------------------------------------------------- /noisyspeech_synthesizer.cfg: -------------------------------------------------------------------------------- 1 | # Configuration for generating Noisy Speech Dataset 2 | 3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz 4 | # - audioformat: default is .wav 5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances. 6 | # - silence_length: Duration of silence introduced between clean speech utterances. 7 | # - total_hours: Total number of hours of data required. Units are in hours. 8 | # - snr_lower: Lower bound for SNR required (default: 0 dB) 9 | # - snr_upper: Upper bound for SNR required (default: 40 dB) 10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB) 11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB) 12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper) 13 | # - clean_activity_threshold: Activity threshold for clean speech 14 | # - noise_activity_threshold: Activity threshold for noise 15 | # - fileindex_start: Starting file ID that will be used in filenames 16 | # - fileindex_end: Last file ID that will be used in filenames 17 | # - is_test_set: Set it to True if it is the test set, else False for the training set 18 | # - noise_dir: Specify the directory path to all noise files 19 | # - Speech_dir: Specify the directory path to all clean speech files 20 | # - noisy_destination: Specify path to the destination directory to store noisy speech 21 | # - clean_destination: Specify path to the destination directory to store clean speech 22 | # - noise_destination: Specify path to the destination directory to store noise speech 23 | # - log_dir: Specify path to the directory to store all the log files 24 | 25 | # Configuration for unit tests 26 | # - snr_test: Set to True if SNR test is required, else False 27 | # - norm_test: Set to True if Normalization test is required, else False 28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False 29 | # - clipping_test: Set to True if Clipping test is required, else False 30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs 31 | 32 | [noisy_speech] 33 | 34 | sampling_rate: 16000 35 | audioformat: *.wav 36 | audio_length: 30 37 | silence_length: 0.2 38 | total_hours: 500 39 | snr_lower: -5 40 | snr_upper: 20 41 | randomize_snr: True 42 | target_level_lower: -35 43 | target_level_upper: -15 44 | total_snrlevels: 21 45 | clean_activity_threshold: 0.6 46 | noise_activity_threshold: 0.0 47 | fileindex_start: None 48 | fileindex_end: None 49 | is_test_set: False 50 | 51 | noise_dir: datasets\noise 52 | speech_dir: datasets\clean\read_speech 53 | noise_types_excluded: None 54 | 55 | noisy_destination: datasets\training_set_sept12\noisy 56 | clean_destination: datasets\training_set_sept12\clean 57 | noise_destination: datasets\training_set_sept12\noise 58 | log_dir: logs 59 | 60 | # Config: add singing voice to clean speech 61 | use_singing_data=1 62 | # 0 for no, 1 for yes 63 | clean_singing: datasets\clean\singing_voice 64 | #datasets\clean_singing\VocalSet11\FULL 65 | singing_choice: 3 66 | # 1 for only male, 2 for only female, 3 (default) for both male and female 67 | 68 | # Config: add emotional data to clean speech 69 | use_emotion_data=1 70 | # 0 for no, 1 for yes 71 | clean_emotion: datasets\clean\emotional_speech 72 | 73 | # Config: add Chinese (mandarin) data to clean speech 74 | use_mandarin_data=1 75 | # 0 for no, 1 for yes 76 | clean_mandarin: datasets\clean\mandarin_speech 77 | 78 | # Config: add reverb to clean speech 79 | rir_choice: 3 80 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic 81 | lower_t60: 0.3 82 | # lower bound of t60 range in seconds 83 | upper_t60: 1.3 84 | # upper bound of t60 range in seconds 85 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv 86 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv 87 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR 88 | 89 | # Unit tests config 90 | snr_test: True 91 | norm_test: True 92 | sampling_rate_test = True 93 | clipping_test = True 94 | 95 | unit_tests_log_dir: unittests_logs 96 | -------------------------------------------------------------------------------- /noisyspeech_synthesizer_singleprocess.py: -------------------------------------------------------------------------------- 1 | """ 2 | @author: chkarada 3 | """ 4 | 5 | # Note: This single process audio synthesizer will attempt to use each clean 6 | # speech sourcefile once, as it does not randomly sample from these files 7 | 8 | import os 9 | import sys 10 | import glob 11 | import argparse 12 | import ast 13 | import configparser as CP 14 | from random import shuffle 15 | import random 16 | 17 | import librosa 18 | import numpy as np 19 | from scipy import signal 20 | from audiolib import audioread, audiowrite, segmental_snr_mixer, activitydetector, is_clipped, add_clipping 21 | import utils 22 | 23 | import pandas as pd 24 | from pathlib import Path 25 | from scipy.io import wavfile 26 | 27 | MAXTRIES = 50 28 | MAXFILELEN = 100 29 | 30 | np.random.seed(5) 31 | random.seed(5) 32 | 33 | def add_pyreverb(clean_speech, rir): 34 | 35 | reverb_speech = signal.fftconvolve(clean_speech, rir, mode="full") 36 | 37 | # make reverb_speech same length as clean_speech 38 | reverb_speech = reverb_speech[0 : clean_speech.shape[0]] 39 | 40 | return reverb_speech 41 | 42 | def build_audio(is_clean, params, index, audio_samples_length=-1): 43 | '''Construct an audio signal from source files''' 44 | 45 | fs_output = params['fs'] 46 | silence_length = params['silence_length'] 47 | if audio_samples_length == -1: 48 | audio_samples_length = int(params['audio_length']*params['fs']) 49 | 50 | output_audio = np.zeros(0) 51 | remaining_length = audio_samples_length 52 | files_used = [] 53 | clipped_files = [] 54 | 55 | if is_clean: 56 | source_files = params['cleanfilenames'] 57 | idx = index 58 | else: 59 | if 'noisefilenames' in params.keys(): 60 | source_files = params['noisefilenames'] 61 | idx = index 62 | # if noise files are organized into individual subdirectories, pick a directory randomly 63 | else: 64 | noisedirs = params['noisedirs'] 65 | # pick a noise category randomly 66 | idx_n_dir = np.random.randint(0, np.size(noisedirs)) 67 | source_files = glob.glob(os.path.join(noisedirs[idx_n_dir], 68 | params['audioformat'])) 69 | shuffle(source_files) 70 | # pick a noise source file index randomly 71 | idx = np.random.randint(0, np.size(source_files)) 72 | 73 | # initialize silence 74 | silence = np.zeros(int(fs_output*silence_length)) 75 | 76 | # iterate through multiple clips until we have a long enough signal 77 | tries_left = MAXTRIES 78 | while remaining_length > 0 and tries_left > 0: 79 | 80 | # read next audio file and resample if necessary 81 | 82 | idx = (idx + 1) % np.size(source_files) 83 | input_audio, fs_input = audioread(source_files[idx]) 84 | if input_audio is None: 85 | sys.stderr.write("WARNING: Cannot read file: %s\n" % source_files[idx]) 86 | continue 87 | if fs_input != fs_output: 88 | input_audio = librosa.resample(input_audio, fs_input, fs_output) 89 | 90 | # if current file is longer than remaining desired length, and this is 91 | # noise generation or this is training set, subsample it randomly 92 | if len(input_audio) > remaining_length and (not is_clean or not params['is_test_set']): 93 | idx_seg = np.random.randint(0, len(input_audio)-remaining_length) 94 | input_audio = input_audio[idx_seg:idx_seg+remaining_length] 95 | 96 | # check for clipping, and if found move onto next file 97 | if is_clipped(input_audio): 98 | clipped_files.append(source_files[idx]) 99 | tries_left -= 1 100 | continue 101 | 102 | # concatenate current input audio to output audio stream 103 | files_used.append(source_files[idx]) 104 | output_audio = np.append(output_audio, input_audio) 105 | remaining_length -= len(input_audio) 106 | 107 | # add some silence if we have not reached desired audio length 108 | if remaining_length > 0: 109 | silence_len = min(remaining_length, len(silence)) 110 | output_audio = np.append(output_audio, silence[:silence_len]) 111 | remaining_length -= silence_len 112 | 113 | if tries_left == 0 and not is_clean and 'noisedirs' in params.keys(): 114 | print("There are not enough non-clipped files in the " + noisedirs[idx_n_dir] + \ 115 | " directory to complete the audio build") 116 | return [], [], clipped_files, idx 117 | 118 | return output_audio, files_used, clipped_files, idx 119 | 120 | 121 | def gen_audio(is_clean, params, index, audio_samples_length=-1): 122 | '''Calls build_audio() to get an audio signal, and verify that it meets the 123 | activity threshold''' 124 | 125 | clipped_files = [] 126 | low_activity_files = [] 127 | if audio_samples_length == -1: 128 | audio_samples_length = int(params['audio_length']*params['fs']) 129 | if is_clean: 130 | activity_threshold = params['clean_activity_threshold'] 131 | else: 132 | activity_threshold = params['noise_activity_threshold'] 133 | 134 | while True: 135 | audio, source_files, new_clipped_files, index = \ 136 | build_audio(is_clean, params, index, audio_samples_length) 137 | 138 | clipped_files += new_clipped_files 139 | if len(audio) < audio_samples_length: 140 | continue 141 | 142 | if activity_threshold == 0.0: 143 | break 144 | 145 | percactive = activitydetector(audio=audio) 146 | if percactive > activity_threshold: 147 | break 148 | else: 149 | low_activity_files += source_files 150 | 151 | return audio, source_files, clipped_files, low_activity_files, index 152 | 153 | 154 | def main_gen(params): 155 | '''Calls gen_audio() to generate the audio signals, verifies that they meet 156 | the requirements, and writes the files to storage''' 157 | 158 | clean_source_files = [] 159 | clean_clipped_files = [] 160 | clean_low_activity_files = [] 161 | noise_source_files = [] 162 | noise_clipped_files = [] 163 | noise_low_activity_files = [] 164 | 165 | clean_index = 0 166 | noise_index = 0 167 | file_num = params['fileindex_start'] 168 | 169 | while file_num <= params['fileindex_end']: 170 | # generate clean speech 171 | clean, clean_sf, clean_cf, clean_laf, clean_index = \ 172 | gen_audio(True, params, clean_index) 173 | 174 | # add reverb with selected RIR 175 | rir_index = random.randint(0,len(params['myrir'])-1) 176 | 177 | my_rir = os.path.normpath(os.path.join('datasets', 'impulse_responses', params['myrir'][rir_index])) 178 | (fs_rir,samples_rir) = wavfile.read(my_rir) 179 | 180 | my_channel = int(params['mychannel'][rir_index]) 181 | 182 | if samples_rir.ndim==1: 183 | samples_rir_ch = np.array(samples_rir) 184 | 185 | elif my_channel > 1: 186 | samples_rir_ch = samples_rir[:, my_channel -1] 187 | else: 188 | samples_rir_ch = samples_rir[:, my_channel -1] 189 | #print(samples_rir.shape) 190 | #print(my_channel) 191 | 192 | clean = add_pyreverb(clean, samples_rir_ch) 193 | 194 | # generate noise 195 | noise, noise_sf, noise_cf, noise_laf, noise_index = \ 196 | gen_audio(False, params, noise_index, len(clean)) 197 | 198 | clean_clipped_files += clean_cf 199 | clean_low_activity_files += clean_laf 200 | noise_clipped_files += noise_cf 201 | noise_low_activity_files += noise_laf 202 | 203 | # get rir files and config 204 | 205 | # mix clean speech and noise 206 | # if specified, use specified SNR value 207 | if not params['randomize_snr']: 208 | snr = params['snr'] 209 | # use a randomly sampled SNR value between the specified bounds 210 | else: 211 | snr = np.random.randint(params['snr_lower'], params['snr_upper']) 212 | 213 | clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params, 214 | clean=clean, 215 | noise=noise, 216 | snr=snr) 217 | # Uncomment the below lines if you need segmental SNR and comment the above lines using snr_mixer 218 | #clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params, 219 | # clean=clean, 220 | # noise=noise, 221 | # snr=snr) 222 | # unexpected clipping 223 | if is_clipped(clean_snr) or is_clipped(noise_snr) or is_clipped(noisy_snr): 224 | print("Warning: File #" + str(file_num) + " has unexpected clipping, " + \ 225 | "returning without writing audio to disk") 226 | continue 227 | 228 | clean_source_files += clean_sf 229 | noise_source_files += noise_sf 230 | 231 | # write resultant audio streams to files 232 | hyphen = '-' 233 | clean_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in clean_sf] 234 | clean_files_joined = hyphen.join(clean_source_filenamesonly)[:MAXFILELEN] 235 | noise_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in noise_sf] 236 | noise_files_joined = hyphen.join(noise_source_filenamesonly)[:MAXFILELEN] 237 | 238 | noisyfilename = clean_files_joined + '_' + noise_files_joined + '_snr' + \ 239 | str(snr) + '_tl' + str(target_level) + '_fileid_' + str(file_num) + '.wav' 240 | cleanfilename = 'clean_fileid_'+str(file_num)+'.wav' 241 | noisefilename = 'noise_fileid_'+str(file_num)+'.wav' 242 | 243 | noisypath = os.path.join(params['noisyspeech_dir'], noisyfilename) 244 | cleanpath = os.path.join(params['clean_proc_dir'], cleanfilename) 245 | noisepath = os.path.join(params['noise_proc_dir'], noisefilename) 246 | 247 | audio_signals = [noisy_snr, clean_snr, noise_snr] 248 | file_paths = [noisypath, cleanpath, noisepath] 249 | 250 | file_num += 1 251 | for i in range(len(audio_signals)): 252 | try: 253 | audiowrite(file_paths[i], audio_signals[i], params['fs']) 254 | except Exception as e: 255 | print(str(e)) 256 | 257 | 258 | return clean_source_files, clean_clipped_files, clean_low_activity_files, \ 259 | noise_source_files, noise_clipped_files, noise_low_activity_files 260 | 261 | 262 | def main_body(): 263 | '''Main body of this file''' 264 | 265 | parser = argparse.ArgumentParser() 266 | 267 | # Configurations: read noisyspeech_synthesizer.cfg and gather inputs 268 | parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg', 269 | help='Read noisyspeech_synthesizer.cfg for all the details') 270 | parser.add_argument('--cfg_str', type=str, default='noisy_speech') 271 | args = parser.parse_args() 272 | 273 | params = dict() 274 | params['args'] = args 275 | cfgpath = os.path.join(os.path.dirname(__file__), args.cfg) 276 | assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]' 277 | 278 | cfg = CP.ConfigParser() 279 | cfg._interpolation = CP.ExtendedInterpolation() 280 | cfg.read(cfgpath) 281 | params['cfg'] = cfg._sections[args.cfg_str] 282 | cfg = params['cfg'] 283 | 284 | clean_dir = os.path.join(os.path.dirname(__file__), 'datasets/clean') 285 | 286 | if cfg['speech_dir'] != 'None': 287 | clean_dir = cfg['speech_dir'] 288 | if not os.path.exists(clean_dir): 289 | assert False, ('Clean speech data is required') 290 | 291 | noise_dir = os.path.join(os.path.dirname(__file__), 'datasets/noise') 292 | 293 | if cfg['noise_dir'] != 'None': 294 | noise_dir = cfg['noise_dir'] 295 | if not os.path.exists: 296 | assert False, ('Noise data is required') 297 | 298 | params['fs'] = int(cfg['sampling_rate']) 299 | params['audioformat'] = cfg['audioformat'] 300 | params['audio_length'] = float(cfg['audio_length']) 301 | params['silence_length'] = float(cfg['silence_length']) 302 | params['total_hours'] = float(cfg['total_hours']) 303 | 304 | # clean singing speech 305 | params['use_singing_data'] = int(cfg['use_singing_data']) 306 | params['clean_singing'] = str(cfg['clean_singing']) 307 | params['singing_choice'] = int(cfg['singing_choice']) 308 | 309 | # clean emotional speech 310 | params['use_emotion_data'] = int(cfg['use_emotion_data']) 311 | params['clean_emotion'] = str(cfg['clean_emotion']) 312 | 313 | # clean mandarin speech 314 | params['use_mandarin_data'] = int(cfg['use_mandarin_data']) 315 | params['clean_mandarin'] = str(cfg['clean_mandarin']) 316 | 317 | # rir 318 | params['rir_choice'] = int(cfg['rir_choice']) 319 | params['lower_t60'] = float(cfg['lower_t60']) 320 | params['upper_t60'] = float(cfg['upper_t60']) 321 | params['rir_table_csv'] = str(cfg['rir_table_csv']) 322 | params['clean_speech_t60_csv'] = str(cfg['clean_speech_t60_csv']) 323 | 324 | if cfg['fileindex_start'] != 'None' and cfg['fileindex_end'] != 'None': 325 | params['num_files'] = int(cfg['fileindex_end'])-int(cfg['fileindex_start']) 326 | params['fileindex_start'] = int(cfg['fileindex_start']) 327 | params['fileindex_end'] = int(cfg['fileindex_end']) 328 | else: 329 | params['num_files'] = int((params['total_hours']*60*60)/params['audio_length']) 330 | params['fileindex_start'] = 0 331 | params['fileindex_end'] = params['num_files'] 332 | 333 | print('Number of files to be synthesized:', params['num_files']) 334 | 335 | params['is_test_set'] = utils.str2bool(cfg['is_test_set']) 336 | params['clean_activity_threshold'] = float(cfg['clean_activity_threshold']) 337 | params['noise_activity_threshold'] = float(cfg['noise_activity_threshold']) 338 | params['snr_lower'] = int(cfg['snr_lower']) 339 | params['snr_upper'] = int(cfg['snr_upper']) 340 | 341 | params['randomize_snr'] = utils.str2bool(cfg['randomize_snr']) 342 | params['target_level_lower'] = int(cfg['target_level_lower']) 343 | params['target_level_upper'] = int(cfg['target_level_upper']) 344 | 345 | if 'snr' in cfg.keys(): 346 | params['snr'] = int(cfg['snr']) 347 | else: 348 | params['snr'] = int((params['snr_lower'] + params['snr_upper'])/2) 349 | 350 | params['noisyspeech_dir'] = utils.get_dir(cfg, 'noisy_destination', 'noisy') 351 | params['clean_proc_dir'] = utils.get_dir(cfg, 'clean_destination', 'clean') 352 | params['noise_proc_dir'] = utils.get_dir(cfg, 'noise_destination', 'noise') 353 | 354 | if 'speech_csv' in cfg.keys() and cfg['speech_csv'] != 'None': 355 | cleanfilenames = pd.read_csv(cfg['speech_csv']) 356 | cleanfilenames = cleanfilenames['filename'] 357 | else: 358 | #cleanfilenames = glob.glob(os.path.join(clean_dir, params['audioformat'])) 359 | cleanfilenames= [] 360 | for path in Path(clean_dir).rglob('*.wav'): 361 | cleanfilenames.append(str(path.resolve())) 362 | 363 | shuffle(cleanfilenames) 364 | # add singing voice to clean speech 365 | if params['use_singing_data'] ==1: 366 | all_singing= [] 367 | for path in Path(params['clean_singing']).rglob('*.wav'): 368 | all_singing.append(str(path.resolve())) 369 | 370 | if params['singing_choice']==1: # male speakers 371 | mysinging = [s for s in all_singing if ("male" in s and "female" not in s)] 372 | 373 | elif params['singing_choice']==2: # female speakers 374 | mysinging = [s for s in all_singing if "female" in s] 375 | 376 | elif params['singing_choice']==3: # both male and female 377 | mysinging = all_singing 378 | else: # default both male and female 379 | mysinging = all_singing 380 | 381 | shuffle(mysinging) 382 | if mysinging is not None: 383 | all_cleanfiles= cleanfilenames + mysinging 384 | else: 385 | all_cleanfiles= cleanfilenames 386 | 387 | # add emotion data to clean speech 388 | if params['use_emotion_data'] ==1: 389 | all_emotion= [] 390 | for path in Path(params['clean_emotion']).rglob('*.wav'): 391 | all_emotion.append(str(path.resolve())) 392 | 393 | shuffle(all_emotion) 394 | if all_emotion is not None: 395 | all_cleanfiles = all_cleanfiles + all_emotion 396 | else: 397 | print('NOT using emotion data for training!') 398 | 399 | # add mandarin data to clean speech 400 | if params['use_mandarin_data'] ==1: 401 | all_mandarin= [] 402 | for path in Path(params['clean_mandarin']).rglob('*.wav'): 403 | all_mandarin.append(str(path.resolve())) 404 | 405 | shuffle(all_mandarin) 406 | if all_mandarin is not None: 407 | all_cleanfiles = all_cleanfiles + all_mandarin 408 | else: 409 | print('NOT using non-english (Mandarin) data for training!') 410 | 411 | 412 | params['cleanfilenames'] = all_cleanfiles 413 | params['num_cleanfiles'] = len(params['cleanfilenames']) 414 | # If there are .wav files in noise_dir directory, use those 415 | # If not, that implies that the noise files are organized into subdirectories by type, 416 | # so get the names of the non-excluded subdirectories 417 | if 'noise_csv' in cfg.keys() and cfg['noise_csv'] != 'None': 418 | noisefilenames = pd.read_csv(cfg['noise_csv']) 419 | noisefilenames = noisefilenames['filename'] 420 | else: 421 | noisefilenames = glob.glob(os.path.join(noise_dir, params['audioformat'])) 422 | 423 | if len(noisefilenames)!=0: 424 | shuffle(noisefilenames) 425 | params['noisefilenames'] = noisefilenames 426 | else: 427 | noisedirs = glob.glob(os.path.join(noise_dir, '*')) 428 | if cfg['noise_types_excluded'] != 'None': 429 | dirstoexclude = cfg['noise_types_excluded'].split(',') 430 | for dirs in dirstoexclude: 431 | noisedirs.remove(dirs) 432 | shuffle(noisedirs) 433 | params['noisedirs'] = noisedirs 434 | 435 | # rir 436 | temp = pd.read_csv(params['rir_table_csv'], skiprows=[1], sep=',', header=None, names=['wavfile','channel','T60_WB','C50_WB','isRealRIR']) 437 | temp.keys() 438 | #temp.wavfile 439 | 440 | rir_wav = temp['wavfile'][1:] # 115413 441 | rir_channel = temp['channel'][1:] 442 | rir_t60 = temp['T60_WB'][1:] 443 | rir_isreal= temp['isRealRIR'][1:] 444 | 445 | rir_wav2 = [w.replace('\\', '/') for w in rir_wav] 446 | rir_channel2 = [w for w in rir_channel] 447 | rir_t60_2 = [w for w in rir_t60] 448 | rir_isreal2= [w for w in rir_isreal] 449 | 450 | myrir =[] 451 | mychannel=[] 452 | myt60=[] 453 | 454 | lower_t60= params['lower_t60'] 455 | upper_t60= params['upper_t60'] 456 | 457 | if params['rir_choice']==1: # real 3076 IRs 458 | real_indices= [i for i, x in enumerate(rir_isreal2) if x == "1"] 459 | 460 | chosen_i = [] 461 | for i in real_indices: 462 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60): 463 | chosen_i.append(i) 464 | 465 | myrir= [rir_wav2[i] for i in chosen_i] 466 | mychannel = [rir_channel2[i] for i in chosen_i] 467 | myt60 = [rir_t60_2[i] for i in chosen_i] 468 | 469 | 470 | elif params['rir_choice']==2: # synthetic 112337 IRs 471 | synthetic_indices= [i for i, x in enumerate(rir_isreal2) if x == "0"] 472 | 473 | chosen_i = [] 474 | for i in synthetic_indices: 475 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60): 476 | chosen_i.append(i) 477 | 478 | myrir= [rir_wav2[i] for i in chosen_i] 479 | mychannel = [rir_channel2[i] for i in chosen_i] 480 | myt60 = [rir_t60_2[i] for i in chosen_i] 481 | 482 | elif params['rir_choice']==3: # both real and synthetic 483 | all_indices= [i for i, x in enumerate(rir_isreal2)] 484 | 485 | chosen_i = [] 486 | for i in all_indices: 487 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60): 488 | chosen_i.append(i) 489 | 490 | myrir= [rir_wav2[i] for i in chosen_i] 491 | mychannel = [rir_channel2[i] for i in chosen_i] 492 | myt60 = [rir_t60_2[i] for i in chosen_i] 493 | 494 | else: # default both real and synthetic 495 | all_indices= [i for i, x in enumerate(rir_isreal2)] 496 | 497 | chosen_i = [] 498 | for i in all_indices: 499 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60): 500 | chosen_i.append(i) 501 | 502 | myrir= [rir_wav2[i] for i in chosen_i] 503 | mychannel = [rir_channel2[i] for i in chosen_i] 504 | myt60 = [rir_t60_2[i] for i in chosen_i] 505 | 506 | params['myrir'] = myrir 507 | params['mychannel'] = mychannel 508 | params['myt60'] = myt60 509 | 510 | # Call main_gen() to generate audio 511 | clean_source_files, clean_clipped_files, clean_low_activity_files, \ 512 | noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params) 513 | 514 | # Create log directory if needed, and write log files of clipped and low activity files 515 | log_dir = utils.get_dir(cfg, 'log_dir', 'Logs') 516 | 517 | utils.write_log_file(log_dir, 'source_files.csv', clean_source_files + noise_source_files) 518 | utils.write_log_file(log_dir, 'clipped_files.csv', clean_clipped_files + noise_clipped_files) 519 | utils.write_log_file(log_dir, 'low_activity_files.csv', \ 520 | clean_low_activity_files + noise_low_activity_files) 521 | 522 | # Compute and print stats about percentange of clipped and low activity files 523 | total_clean = len(clean_source_files) + len(clean_clipped_files) + len(clean_low_activity_files) 524 | total_noise = len(noise_source_files) + len(noise_clipped_files) + len(noise_low_activity_files) 525 | pct_clean_clipped = round(len(clean_clipped_files)/total_clean*100, 1) 526 | pct_noise_clipped = round(len(noise_clipped_files)/total_noise*100, 1) 527 | pct_clean_low_activity = round(len(clean_low_activity_files)/total_clean*100, 1) 528 | pct_noise_low_activity = round(len(noise_low_activity_files)/total_noise*100, 1) 529 | 530 | print("Of the " + str(total_clean) + " clean speech files analyzed, " + \ 531 | str(pct_clean_clipped) + "% had clipping, and " + str(pct_clean_low_activity) + \ 532 | "% had low activity " + "(below " + str(params['clean_activity_threshold']*100) + \ 533 | "% active percentage)") 534 | print("Of the " + str(total_noise) + " noise files analyzed, " + str(pct_noise_clipped) + \ 535 | "% had clipping, and " + str(pct_noise_low_activity) + "% had low activity " + \ 536 | "(below " + str(params['noise_activity_threshold']*100) + "% active percentage)") 537 | 538 | 539 | if __name__ == '__main__': 540 | 541 | main_body() 542 | -------------------------------------------------------------------------------- /pdns_synthesizer_icassp2023.cfg: -------------------------------------------------------------------------------- 1 | # Configuration for generating Noisy Speech Dataset 2 | 3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz 4 | # - audioformat: default is .wav 5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances. 6 | # - silence_length: Duration of silence introduced between clean speech utterances. 7 | # - total_hours: Total number of hours of data required. Units are in hours. 8 | # - snr_lower: Lower bound for SNR required (default: 0 dB) 9 | # - snr_upper: Upper bound for SNR required (default: 40 dB) 10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB) 11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB) 12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper) 13 | # - clean_activity_threshold: Activity threshold for clean speech 14 | # - noise_activity_threshold: Activity threshold for noise 15 | # - fileindex_start: Starting file ID that will be used in filenames 16 | # - fileindex_end: Last file ID that will be used in filenames 17 | # - is_test_set: Set it to True if it is the test set, else False for the training set 18 | # - noise_dir: Specify the directory path to all noise files 19 | # - Speech_dir: Specify the directory path to all clean speech files 20 | # - noisy_destination: Specify path to the destination directory to store noisy speech 21 | # - clean_destination: Specify path to the destination directory to store clean speech 22 | # - noise_destination: Specify path to the destination directory to store noise speech 23 | # - log_dir: Specify path to the directory to store all the log files 24 | 25 | # Configuration for unit tests 26 | # - snr_test: Set to True if SNR test is required, else False 27 | # - norm_test: Set to True if Normalization test is required, else False 28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False 29 | # - clipping_test: Set to True if Clipping test is required, else False 30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs 31 | 32 | [noisy_speech] 33 | 34 | sampling_rate: 48000 35 | audioformat: *.wav 36 | audio_length: 30 37 | # 15, 12, 30 38 | silence_length: 0.2 39 | total_hours: 1000 40 | # 1000 41 | #200 42 | # 2.5, 500, 100 43 | snr_lower: -5 44 | #-5, 0 45 | snr_upper: 20 46 | # 25, 40 47 | randomize_snr: True 48 | target_level_lower: -35 49 | target_level_upper: -15 50 | total_snrlevels: 31 51 | # 5 52 | clean_activity_threshold: 0.0 53 | noise_activity_threshold: 0.2 54 | fileindex_start: None 55 | fileindex_end: None 56 | is_test_set: False 57 | # True, False 58 | 59 | noise_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/noise 60 | #/mnt/f/4th_DNSChallenge/INTERSPEECH_2021/DNS-Challenge/datasets_fullband/noise 61 | #F:\4th_DNSChallenge\INTERSPEECH_2021\DNS-Challenge\datasets_fullband\noise 62 | #datasets\pdns_training_set\noise 63 | #\test_set2\Test_Noise 64 | # datasets\noise 65 | # \datasets\noise 66 | 67 | speech_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean 68 | # D:\kanhawin_git\primary_speakers_VCTK_16k_for_synthesizer 69 | # datasets\test_set2\Singing_Voice\wav_16k 70 | # dir with secondary speaker clean speech 71 | speech_dir2: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean 72 | #D:\kanhawin_git\secondary_speakers_voxCeleb2_16k 73 | # datasets\test_set2\Singing_Voice\wav_16k 74 | 75 | spkid_csv: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/complete_ps_split.csv 76 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/vctk_spkid.csv 77 | # datasets\clean 78 | noise_types_excluded: None 79 | 80 | rir_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/impulse_responses 81 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/impulse_responses 82 | # F:\4th_DNSChallenge\ICASSP_2022\DNS-Challenge\datasets\impulse_responses 83 | 84 | # \datasets\clean 85 | noisy_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noisy 86 | # datasets/training_data/noisy 87 | # datasets\test_set2\synthetic_personalizeddns\noisy 88 | #training_set2_onlyrealrir\noisy 89 | #\noisy 90 | clean_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/clean 91 | #datasets\test_set2\synthetic_personalizeddns\clean 92 | # training_set2_onlyrealrir\clean 93 | # \clean 94 | noise_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noise 95 | # datasets/training_data/noise 96 | #datasets\test_set2\synthetic_personalizeddns\noise 97 | #training_set2_onlyrealrir\noise 98 | # \noise 99 | log_dir: logs 100 | # \logs 101 | 102 | # Config: add singing voice to clean speech 103 | clean_singing: datasets\clean_singing\VocalSet11\FULL 104 | singing_choice: 3 105 | # 1 for only male, 2 for only female, 3 (default) for both male and female 106 | 107 | # Config: add reverb to clean speech 108 | rir_choice: 1 109 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic 110 | lower_t60: 0.3 111 | # lower bound of t60 range in seconds 112 | upper_t60: 1.3 113 | # upper bound of t60 range in seconds 114 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv 115 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv 116 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR 117 | 118 | # pdns testsets 119 | # primary_data: D:\kanhawin_git\primary_speakers_VCTK_16k 120 | #'D:\PersonalizedDNS_dataset\synthetic_primary' 121 | # secondary_data='D:\kanhawin_git\secondary_speakers_voxCeleb2_16k' 122 | #'D:\PersonalizedDNS_dataset\synthetic_secondary' 123 | # noise_data= datasets\test_set2\synthetic\noise 124 | # pdns_testset_clean= datasets\test_set2\pdns\clean 125 | # pdns_testset_noisy= datasets\test_set2\pdns\noisy 126 | 127 | # adaptation_data_seconds=120 128 | # num_primary_spk=100 129 | # num_clips=600 130 | 131 | # Unit tests config 132 | snr_test: True 133 | norm_test: True 134 | sampling_rate_test = True 135 | clipping_test = True 136 | 137 | unit_tests_log_dir: unittests_logs 138 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.22.4 2 | soundfile==0.9.0 3 | librosa==0.8.1 4 | argparse==1.1 5 | configparser==5.3.0 6 | pandas==1.2.4 7 | logging==0.4.9.6 8 | onnxruntime==1.13.1 9 | torch==1.10.0 10 | torchvision==0.11.1 11 | torchaudio==0.10.0 12 | -------------------------------------------------------------------------------- /unit_tests_synthesizer.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import soundfile as sf 3 | import glob 4 | import argparse 5 | import os 6 | import utils 7 | import configparser as CP 8 | 9 | LOW_ENERGY_THRESH = -60 10 | 11 | def test_snr(clean, noise, expected_snr, snrtolerance=2): 12 | '''Test for SNR 13 | Note: It is not applicable for Segmental SNR''' 14 | rmsclean = (clean**2).mean()**0.5 15 | rmsnoise = (noise**2).mean()**0.5 16 | actual_snr = 20*np.log10(rmsclean/rmsnoise) 17 | return actual_snr > (expected_snr-snrtolerance) and actual_snr < (expected_snr+snrtolerance) 18 | 19 | def test_normalization(audio, expected_rms=-25, normtolerance=2): 20 | '''Test for Normalization 21 | Note: Set it to False if different target levels are used''' 22 | rmsaudio = (audio**2).mean()**0.5 23 | rmsaudiodb = 20*np.log10(rmsaudio) 24 | return rmsaudiodb > (expected_rms-normtolerance) and rmsaudiodb < (expected_rms+normtolerance) 25 | 26 | def test_samplingrate(sr, expected_sr=16000): 27 | '''Test to ensure all clips have same sampling rate''' 28 | return expected_sr == sr 29 | 30 | def test_clipping(audio, num_consecutive_samples=3, clipping_threshold=0.01): 31 | '''Test to detect clipping''' 32 | clipping = False 33 | for i in range(0, len(audio)-num_consecutive_samples-1): 34 | audioseg = audio[i:i+num_consecutive_samples] 35 | if abs(max(audioseg)-min(audioseg)) < clipping_threshold or abs(max(audioseg)) >= 1: 36 | clipping = True 37 | break 38 | return clipping 39 | 40 | def test_zeros_beg_end(audio, num_zeros=16000, low_energy_thresh=LOW_ENERGY_THRESH): 41 | '''Test if there are zeros in the beginning and the end of the signal''' 42 | beg_segment_energy = 20*np.log10(audio[:num_zeros]**2).mean()**0.5 43 | end_segment_energy = 20*np.log10(audio[-num_zeros:]**2).mean()**0.5 44 | return beg_segment_energy < low_energy_thresh or end_segment_energy < low_energy_thresh 45 | 46 | def adsp_filtering_test(adsp, without_adsp): 47 | diff = adsp - without_adsp 48 | if any(val >0.0001 for val in diff): 49 | 50 | 51 | if __name__=='__main__': 52 | parser = argparse.ArgumentParser() 53 | parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg') 54 | parser.add_argument('--cfg_str', type=str, default='noisy_speech') 55 | 56 | args = parser.parse_args() 57 | 58 | cfgpath = os.path.join(os.path.dirname(__file__), args.cfg) 59 | assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]' 60 | 61 | cfg = CP.ConfigParser() 62 | cfg._interpolation = CP.ExtendedInterpolation() 63 | cfg.read(cfgpath) 64 | cfg = cfg._sections[args.cfg_str] 65 | 66 | noisydir = cfg['noisy_train'] 67 | cleandir = cfg['clean_train'] 68 | noisedir = cfg['noise_train'] 69 | audioformat = cfg['audioformat'] 70 | 71 | # List of noisy speech files 72 | noisy_speech_filenames_big = glob.glob(os.path.join(noisydir, audioformat)) 73 | noisy_speech_filenames = noisy_speech_filenames_big[0:10] 74 | # Initialize the lists 75 | noisy_filenames_list = [] 76 | clean_filenames_list = [] 77 | noise_filenames_list = [] 78 | snr_results_list =[] 79 | clean_norm_results_list = [] 80 | noise_norm_results_list = [] 81 | noisy_norm_results_list = [] 82 | clean_sr_results_list = [] 83 | noise_sr_results_list = [] 84 | noisy_sr_results_list = [] 85 | clean_clipping_results_list = [] 86 | noise_clipping_results_list = [] 87 | noisy_clipping_results_list = [] 88 | 89 | skipped_string = 'Skipped' 90 | # Initialize the counters for stats 91 | total_clips = len(noisy_speech_filenames) 92 | 93 | 94 | for noisypath in noisy_speech_filenames: 95 | # To do: add right paths to clean filename and noise filename 96 | noisy_filename = os.path.basename(noisypath) 97 | clean_filename = 'clean_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav' 98 | cleanpath = os.path.join(cleandir, clean_filename) 99 | noise_filename = 'noise_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav' 100 | noisepath = os.path.join(noisedir, noise_filename) 101 | 102 | noisy_filenames_list.append(noisy_filename) 103 | clean_filenames_list.append(clean_filename) 104 | noise_filenames_list.append(noise_filename) 105 | 106 | # Read clean, noise and noisy signals 107 | clean_signal, fs_clean = sf.read(cleanpath) 108 | noise_signal, fs_noise = sf.read(noisepath) 109 | noisy_signal, fs_noisy = sf.read(noisypath) 110 | 111 | # SNR Test 112 | # To do: add right path split to extract SNR 113 | if utils.str2bool(cfg['snr_test']): 114 | snr = int(noisy_filename.split('_snr')[1].split('_')[0]) 115 | snr_results_list.append(str(test_snr(clean=clean_signal, \ 116 | noise=noise_signal, expected_snr=snr))) 117 | else: 118 | snr_results_list.append(skipped_string) 119 | 120 | # Normalization test 121 | if utils.str2bool(cfg['norm_test']): 122 | tl = int(noisy_filename.split('_tl')[1].split('_')[0]) 123 | clean_norm_results_list.append(str(test_normalization(clean_signal))) 124 | noise_norm_results_list.append(str(test_normalization(noise_signal))) 125 | noisy_norm_results_list.append(str(test_normalization(noisy_signal, expected_rms=tl))) 126 | else: 127 | clean_norm_results_list.append(skipped_string) 128 | noise_norm_results_list.append(skipped_string) 129 | noisy_norm_results_list.append(skipped_string) 130 | 131 | # Sampling rate test 132 | if utils.str2bool(cfg['sampling_rate_test']): 133 | clean_sr_results_list.append(str(test_samplingrate(sr=fs_clean))) 134 | noise_sr_results_list.append(str(test_samplingrate(sr=fs_noise))) 135 | noisy_sr_results_list.append(str(test_samplingrate(sr=fs_noisy))) 136 | else: 137 | clean_sr_results_list.append(skipped_string) 138 | noise_sr_results_list.append(skipped_string) 139 | noisy_sr_results_list.append(skipped_string) 140 | 141 | # Clipping test 142 | if utils.str2bool(cfg['clipping_test']): 143 | clean_clipping_results_list.append(str(test_clipping(audio=clean_signal))) 144 | noise_clipping_results_list.append(str(test_clipping(audio=noise_signal))) 145 | noisy_clipping_results_list.append(str(test_clipping(audio=noisy_signal))) 146 | else: 147 | clean_clipping_results_list.append(skipped_string) 148 | noise_clipping_results_list.append(skipped_string) 149 | noisy_clipping_results_list.append(skipped_string) 150 | 151 | # Stats 152 | pc_snr_passed = round(snr_results_list.count('True')/total_clips*100, 1) 153 | pc_clean_norm_passed = round(clean_norm_results_list.count('True')/total_clips*100, 1) 154 | pc_noise_norm_passed = round(noise_norm_results_list.count('True')/total_clips*100, 1) 155 | pc_noisy_norm_passed = round(noisy_norm_results_list.count('True')/total_clips*100, 1) 156 | pc_clean_sr_passed = round(clean_sr_results_list.count('True')/total_clips*100, 1) 157 | pc_noise_sr_passed = round(noise_sr_results_list.count('True')/total_clips*100, 1) 158 | pc_noisy_sr_passed = round(noisy_sr_results_list.count('True')/total_clips*100, 1) 159 | pc_clean_clipping_passed = round(clean_clipping_results_list.count('True')/total_clips*100, 1) 160 | pc_noise_clipping_passed = round(noise_clipping_results_list.count('True')/total_clips*100, 1) 161 | pc_noisy_clipping_passed = round(noisy_clipping_results_list.count('True')/total_clips*100, 1) 162 | 163 | print('% clips that passed SNR test:', pc_snr_passed) 164 | 165 | print('% clean clips that passed Normalization tests:', pc_clean_norm_passed) 166 | print('% noise clips that passed Normalization tests:', pc_noise_norm_passed) 167 | print('% noisy clips that passed Normalization tests:', pc_noisy_norm_passed) 168 | 169 | print('% clean clips that passed Sampling Rate tests:', pc_clean_sr_passed) 170 | print('% noise clips that passed Sampling Rate tests:', pc_noise_sr_passed) 171 | print('% noisy clips that passed Sampling Rate tests:', pc_noisy_sr_passed) 172 | 173 | print('% clean clips that passed Clipping tests:', pc_clean_clipping_passed) 174 | print('% noise clips that passed Clipping tests:', pc_noise_clipping_passed) 175 | print('% noisy clips that passed Clipping tests:', pc_noisy_clipping_passed) 176 | 177 | log_dir = utils.get_dir(cfg, 'unit_tests_log_dir', 'Unit_tests_logs') 178 | 179 | if not os.path.exists(log_dir): 180 | log_dir = os.path.join(os.path.dirname(__file__), 'Unit_tests_logs') 181 | os.makedirs(log_dir) 182 | 183 | utils.write_log_file(log_dir, 'unit_test_results.csv', [noisy_filenames_list, clean_filenames_list, \ 184 | noise_filenames_list, snr_results_list, clean_norm_results_list, noise_norm_results_list, \ 185 | noisy_norm_results_list, clean_sr_results_list, noise_sr_results_list, noisy_sr_results_list, \ 186 | clean_clipping_results_list, noise_clipping_results_list, noisy_clipping_results_list]) -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Fri Nov 1 10:28:41 2019 4 | 5 | @author: rocheng 6 | """ 7 | import os 8 | import csv 9 | from shutil import copyfile 10 | import glob 11 | 12 | def get_dir(cfg, param_name, new_dir_name): 13 | '''Helper function to retrieve directory name if it exists, 14 | create it if it doesn't exist''' 15 | 16 | if param_name in cfg: 17 | dir_name = cfg[param_name] 18 | else: 19 | dir_name = os.path.join(os.path.dirname(__file__), new_dir_name) 20 | if not os.path.exists(dir_name): 21 | os.makedirs(dir_name) 22 | return dir_name 23 | 24 | 25 | def write_log_file(log_dir, log_filename, data): 26 | '''Helper function to write log file''' 27 | data = zip(*data) 28 | with open(os.path.join(log_dir, log_filename), mode='w', newline='') as csvfile: 29 | csvwriter = csv.writer(csvfile, delimiter=' ', 30 | quotechar='|', quoting=csv.QUOTE_MINIMAL) 31 | for row in data: 32 | csvwriter.writerow([row]) 33 | 34 | 35 | def str2bool(string): 36 | return string.lower() in ("yes", "true", "t", "1") 37 | 38 | 39 | def rename_copyfile(src_path, dest_dir, prefix='', ext='*.wav'): 40 | srcfiles = glob.glob(f"{src_path}/"+ext) 41 | for i in range(len(srcfiles)): 42 | dest_path = os.path.join(dest_dir, prefix+'_'+os.path.basename(srcfiles[i])) 43 | copyfile(srcfiles[i], dest_path) 44 | 45 | 46 | 47 | --------------------------------------------------------------------------------