├── .gitignore
├── CODE_OF_CONDUCT.md
├── DNSMOS
    ├── DNSMOS
    │   ├── bak_ovr.onnx
    │   ├── model_v8.onnx
    │   ├── sig.onnx
    │   └── sig_bak_ovr.onnx
    ├── README.md
    ├── dnsmos_local.py
    └── pDNSMOS
    │   └── sig_bak_ovr.onnx
├── LICENSE
├── LICENSE-CODE
├── README-DNS3.md
├── README.md
├── SECURITY.md
├── V5_DNS_Challenge_FinalResults.pdf
├── WAcc
    └── WAcc.py
├── audiolib.py
├── docs
    ├── CMT Instructions for uploading enhanced clips_ICASSP2022.pdf
    ├── ICASSP_2021_DNS_challenge.pdf
    └── ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf
├── download-dns-challenge-1.sh
├── download-dns-challenge-2.sh
├── download-dns-challenge-3.sh
├── download-dns-challenge-4-pdns.sh
├── download-dns-challenge-4.sh
├── download-dns-challenge-5-baseline.sh
├── download-dns-challenge-5-filelists-headset.sh
├── download-dns-challenge-5-filelists-speakerphone.sh
├── download-dns-challenge-5-headset-training.sh
├── download-dns-challenge-5-noise-ir.sh
├── download-dns-challenge-5-paralinguistic-train.sh
├── download-dns-challenge-5-speakerphone-training.sh
├── download-dns5-blind-testset.sh
├── download-dns5-dev-testset.sh
├── download_dns_v2_v3_blindset.sh
├── index.html
├── noisyspeech_synthesizer.cfg
├── noisyspeech_synthesizer_singleprocess.py
├── pdns_noisyspeech_synthesizer_singleprocess.py
├── pdns_synthesizer_icassp2023.cfg
├── requirements.txt
├── unit_tests_synthesizer.py
└── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | datasets/
 2 | datasets_fullband/
 3 | training_set/
 4 | training_set2/
 5 | training_set2_onlyrealrir/
 6 | training_set4/
 7 | training_set5/
 8 | logs/
 9 | test_set2/
10 | training_set_sept11/
11 | training_set_sept12/
12 | __pycache__/
13 | *.pyc
14 | *~
15 | /.vs/
16 | /.vscode/
17 | *.wav
18 | *.tar.bz2
19 | *.zip
20 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Microsoft Open Source Code of Conduct
 2 | 
 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 4 | 
 5 | Resources:
 6 | 
 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 | 


--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/bak_ovr.onnx


--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/model_v8.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/model_v8.onnx


--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/sig.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig.onnx


--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/sig_bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig_bak_ovr.onnx


--------------------------------------------------------------------------------
/DNSMOS/README.md:
--------------------------------------------------------------------------------
 1 | # DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors
 2 | 
 3 | Human subjective evaluation is the ”gold standard” to evaluate speech quality optimized for human perception.  Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. DNSMOS generalizes well in challenging test conditions with a high correlation to human ratings in stack ranking noise suppression methods. More details can be found in [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf).
 4 | 
 5 | ## Evaluation methodology:
 6 | Use the **dnsmos_local.py** script.
 7 | 1. To compute a personalized MOS score (where interfering speaker is penalized) provide the '-p' argument
 8 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv -p
 9 | 2. To compute a regular MOS score omit the '-p' argument.
10 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv
11 | 
12 | ## Citation:
13 | If you have used the API for your research and development purpose, please cite the [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf):
14 | ```BibTex
15 | @inproceedings{reddy2021dnsmos,
16 |   title={Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
17 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
18 |   booktitle={ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
19 |   pages={6493--6497},
20 |   year={2021},
21 |   organization={IEEE}
22 | }
23 | ```
24 | 
25 | If you used DNSMOS P.835 please cite the [DNSMOS P.835](https://arxiv.org/pdf/2110.01763.pdf) paper:
26 |   
27 | ```BibTex
28 | @inproceedings{reddy2022dnsmos,
29 |   title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
30 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
31 |   booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
32 |   year={2022},
33 |   organization={IEEE}
34 | }
35 |  ```
36 | 


--------------------------------------------------------------------------------
/DNSMOS/dnsmos_local.py:
--------------------------------------------------------------------------------
  1 | # Usage:
  2 | # python dnsmos_local.py -t c:\temp\DNSChallenge4_Blindset -o DNSCh4_Blind.csv -p
  3 | #
  4 | 
  5 | import argparse
  6 | import concurrent.futures
  7 | import glob
  8 | import os
  9 | 
 10 | import librosa
 11 | import numpy as np
 12 | import numpy.polynomial.polynomial as poly
 13 | import onnxruntime as ort
 14 | import pandas as pd
 15 | import soundfile as sf
 16 | from requests import session
 17 | from tqdm import tqdm
 18 | 
 19 | SAMPLING_RATE = 16000
 20 | INPUT_LENGTH = 9.01
 21 | 
 22 | class ComputeScore:
 23 |     def __init__(self, primary_model_path, p808_model_path) -> None:
 24 |         self.onnx_sess = ort.InferenceSession(primary_model_path)
 25 |         self.p808_onnx_sess = ort.InferenceSession(p808_model_path)
 26 |         
 27 |     def audio_melspec(self, audio, n_mels=120, frame_size=320, hop_length=160, sr=16000, to_db=True):
 28 |         mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_fft=frame_size+1, hop_length=hop_length, n_mels=n_mels)
 29 |         if to_db:
 30 |             mel_spec = (librosa.power_to_db(mel_spec, ref=np.max)+40)/40
 31 |         return mel_spec.T
 32 | 
 33 |     def get_polyfit_val(self, sig, bak, ovr, is_personalized_MOS):
 34 |         if is_personalized_MOS:
 35 |             p_ovr = np.poly1d([-0.00533021,  0.005101  ,  1.18058466, -0.11236046])
 36 |             p_sig = np.poly1d([-0.01019296,  0.02751166,  1.19576786, -0.24348726])
 37 |             p_bak = np.poly1d([-0.04976499,  0.44276479, -0.1644611 ,  0.96883132])
 38 |         else:
 39 |             p_ovr = np.poly1d([-0.06766283,  1.11546468,  0.04602535])
 40 |             p_sig = np.poly1d([-0.08397278,  1.22083953,  0.0052439 ])
 41 |             p_bak = np.poly1d([-0.13166888,  1.60915514, -0.39604546])
 42 | 
 43 |         sig_poly = p_sig(sig)
 44 |         bak_poly = p_bak(bak)
 45 |         ovr_poly = p_ovr(ovr)
 46 | 
 47 |         return sig_poly, bak_poly, ovr_poly
 48 | 
 49 |     def __call__(self, fpath, sampling_rate, is_personalized_MOS):
 50 |         aud, input_fs = sf.read(fpath)
 51 |         fs = sampling_rate
 52 |         if input_fs != fs:
 53 |             audio = librosa.resample(aud, input_fs, fs)
 54 |         else:
 55 |             audio = aud
 56 |         actual_audio_len = len(audio)
 57 |         len_samples = int(INPUT_LENGTH*fs)
 58 |         while len(audio) < len_samples:
 59 |             audio = np.append(audio, audio)
 60 |         
 61 |         num_hops = int(np.floor(len(audio)/fs) - INPUT_LENGTH)+1
 62 |         hop_len_samples = fs
 63 |         predicted_mos_sig_seg_raw = []
 64 |         predicted_mos_bak_seg_raw = []
 65 |         predicted_mos_ovr_seg_raw = []
 66 |         predicted_mos_sig_seg = []
 67 |         predicted_mos_bak_seg = []
 68 |         predicted_mos_ovr_seg = []
 69 |         predicted_p808_mos = []
 70 | 
 71 |         for idx in range(num_hops):
 72 |             audio_seg = audio[int(idx*hop_len_samples) : int((idx+INPUT_LENGTH)*hop_len_samples)]
 73 |             if len(audio_seg) < len_samples:
 74 |                 continue
 75 | 
 76 |             input_features = np.array(audio_seg).astype('float32')[np.newaxis,:]
 77 |             p808_input_features = np.array(self.audio_melspec(audio=audio_seg[:-160])).astype('float32')[np.newaxis, :, :]
 78 |             oi = {'input_1': input_features}
 79 |             p808_oi = {'input_1': p808_input_features}
 80 |             p808_mos = self.p808_onnx_sess.run(None, p808_oi)[0][0][0]
 81 |             mos_sig_raw,mos_bak_raw,mos_ovr_raw = self.onnx_sess.run(None, oi)[0][0]
 82 |             mos_sig,mos_bak,mos_ovr = self.get_polyfit_val(mos_sig_raw,mos_bak_raw,mos_ovr_raw,is_personalized_MOS)
 83 |             predicted_mos_sig_seg_raw.append(mos_sig_raw)
 84 |             predicted_mos_bak_seg_raw.append(mos_bak_raw)
 85 |             predicted_mos_ovr_seg_raw.append(mos_ovr_raw)
 86 |             predicted_mos_sig_seg.append(mos_sig)
 87 |             predicted_mos_bak_seg.append(mos_bak)
 88 |             predicted_mos_ovr_seg.append(mos_ovr)
 89 |             predicted_p808_mos.append(p808_mos)
 90 | 
 91 |         clip_dict = {'filename': fpath, 'len_in_sec': actual_audio_len/fs, 'sr':fs}
 92 |         clip_dict['num_hops'] = num_hops
 93 |         clip_dict['OVRL_raw'] = np.mean(predicted_mos_ovr_seg_raw)
 94 |         clip_dict['SIG_raw'] = np.mean(predicted_mos_sig_seg_raw)
 95 |         clip_dict['BAK_raw'] = np.mean(predicted_mos_bak_seg_raw)
 96 |         clip_dict['OVRL'] = np.mean(predicted_mos_ovr_seg)
 97 |         clip_dict['SIG'] = np.mean(predicted_mos_sig_seg)
 98 |         clip_dict['BAK'] = np.mean(predicted_mos_bak_seg)
 99 |         clip_dict['P808_MOS'] = np.mean(predicted_p808_mos)
100 |         return clip_dict
101 | 
102 | def main(args):
103 |     models = glob.glob(os.path.join(args.testset_dir, "*"))
104 |     audio_clips_list = []
105 |     p808_model_path = os.path.join('DNSMOS', 'model_v8.onnx')
106 | 
107 |     if args.personalized_MOS:
108 |         primary_model_path = os.path.join('pDNSMOS', 'sig_bak_ovr.onnx')
109 |     else:
110 |         primary_model_path = os.path.join('DNSMOS', 'sig_bak_ovr.onnx')
111 | 
112 |     compute_score = ComputeScore(primary_model_path, p808_model_path)
113 | 
114 |     rows = []
115 |     clips = []
116 |     clips = glob.glob(os.path.join(args.testset_dir, "*.wav"))
117 |     is_personalized_eval = args.personalized_MOS
118 |     desired_fs = SAMPLING_RATE
119 |     for m in tqdm(models):
120 |         max_recursion_depth = 10
121 |         audio_path = os.path.join(args.testset_dir, m)
122 |         audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav"))
123 |         while len(audio_clips_list) == 0 and max_recursion_depth > 0:
124 |             audio_path = os.path.join(audio_path, "**")
125 |             audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav"))
126 |             max_recursion_depth -= 1
127 |         clips.extend(audio_clips_list)
128 | 
129 |     with concurrent.futures.ThreadPoolExecutor() as executor:
130 |         future_to_url = {executor.submit(compute_score, clip, desired_fs, is_personalized_eval): clip for clip in clips}
131 |         for future in tqdm(concurrent.futures.as_completed(future_to_url)):
132 |             clip = future_to_url[future]
133 |             try:
134 |                 data = future.result()
135 |             except Exception as exc:
136 |                 print('%r generated an exception: %s' % (clip, exc))
137 |             else:
138 |                 rows.append(data)            
139 | 
140 |     df = pd.DataFrame(rows)
141 |     if args.csv_path:
142 |         csv_path = args.csv_path
143 |         df.to_csv(csv_path)
144 |     else:
145 |         print(df.describe())
146 | 
147 | if __name__=="__main__":
148 |     parser = argparse.ArgumentParser()
149 |     parser.add_argument('-t', "--testset_dir", default='.', 
150 |                         help='Path to the dir containing audio clips in .wav to be evaluated')
151 |     parser.add_argument('-o', "--csv_path", default=None, help='Dir to the csv that saves the results')
152 |     parser.add_argument('-p', "--personalized_MOS", action='store_true', 
153 |                         help='Flag to indicate if personalized MOS score is needed or regular')
154 |     
155 |     args = parser.parse_args()
156 | 
157 |     main(args)
158 | 


--------------------------------------------------------------------------------
/DNSMOS/pDNSMOS/sig_bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/pDNSMOS/sig_bak_ovr.onnx


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | ﻿Attribution 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public: 
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution 4.0 International Public License
 58 | 
 59 | By exercising the Licensed Rights (defined below), You accept and agree
 60 | to be bound by the terms and conditions of this Creative Commons
 61 | Attribution 4.0 International Public License ("Public License"). To the
 62 | extent this Public License may be interpreted as a contract, You are
 63 | granted the Licensed Rights in consideration of Your acceptance of
 64 | these terms and conditions, and the Licensor grants You such rights in
 65 | consideration of benefits the Licensor receives from making the
 66 | Licensed Material available under these terms and conditions.
 67 | 
 68 | 
 69 | Section 1 -- Definitions.
 70 | 
 71 |   a. Adapted Material means material subject to Copyright and Similar
 72 |      Rights that is derived from or based upon the Licensed Material
 73 |      and in which the Licensed Material is translated, altered,
 74 |      arranged, transformed, or otherwise modified in a manner requiring
 75 |      permission under the Copyright and Similar Rights held by the
 76 |      Licensor. For purposes of this Public License, where the Licensed
 77 |      Material is a musical work, performance, or sound recording,
 78 |      Adapted Material is always produced where the Licensed Material is
 79 |      synched in timed relation with a moving image.
 80 | 
 81 |   b. Adapter's License means the license You apply to Your Copyright
 82 |      and Similar Rights in Your contributions to Adapted Material in
 83 |      accordance with the terms and conditions of this Public License.
 84 | 
 85 |   c. Copyright and Similar Rights means copyright and/or similar rights
 86 |      closely related to copyright including, without limitation,
 87 |      performance, broadcast, sound recording, and Sui Generis Database
 88 |      Rights, without regard to how the rights are labeled or
 89 |      categorized. For purposes of this Public License, the rights
 90 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 91 |      Rights.
 92 | 
 93 |   d. Effective Technological Measures means those measures that, in the
 94 |      absence of proper authority, may not be circumvented under laws
 95 |      fulfilling obligations under Article 11 of the WIPO Copyright
 96 |      Treaty adopted on December 20, 1996, and/or similar international
 97 |      agreements.
 98 | 
 99 |   e. Exceptions and Limitations means fair use, fair dealing, and/or
100 |      any other exception or limitation to Copyright and Similar Rights
101 |      that applies to Your use of the Licensed Material.
102 | 
103 |   f. Licensed Material means the artistic or literary work, database,
104 |      or other material to which the Licensor applied this Public
105 |      License.
106 | 
107 |   g. Licensed Rights means the rights granted to You subject to the
108 |      terms and conditions of this Public License, which are limited to
109 |      all Copyright and Similar Rights that apply to Your use of the
110 |      Licensed Material and that the Licensor has authority to license.
111 | 
112 |   h. Licensor means the individual(s) or entity(ies) granting rights
113 |      under this Public License.
114 | 
115 |   i. Share means to provide material to the public by any means or
116 |      process that requires permission under the Licensed Rights, such
117 |      as reproduction, public display, public performance, distribution,
118 |      dissemination, communication, or importation, and to make material
119 |      available to the public including in ways that members of the
120 |      public may access the material from a place and at a time
121 |      individually chosen by them.
122 | 
123 |   j. Sui Generis Database Rights means rights other than copyright
124 |      resulting from Directive 96/9/EC of the European Parliament and of
125 |      the Council of 11 March 1996 on the legal protection of databases,
126 |      as amended and/or succeeded, as well as other essentially
127 |      equivalent rights anywhere in the world.
128 | 
129 |   k. You means the individual or entity exercising the Licensed Rights
130 |      under this Public License. Your has a corresponding meaning.
131 | 
132 | 
133 | Section 2 -- Scope.
134 | 
135 |   a. License grant.
136 | 
137 |        1. Subject to the terms and conditions of this Public License,
138 |           the Licensor hereby grants You a worldwide, royalty-free,
139 |           non-sublicensable, non-exclusive, irrevocable license to
140 |           exercise the Licensed Rights in the Licensed Material to:
141 | 
142 |             a. reproduce and Share the Licensed Material, in whole or
143 |                in part; and
144 | 
145 |             b. produce, reproduce, and Share Adapted Material.
146 | 
147 |        2. Exceptions and Limitations. For the avoidance of doubt, where
148 |           Exceptions and Limitations apply to Your use, this Public
149 |           License does not apply, and You do not need to comply with
150 |           its terms and conditions.
151 | 
152 |        3. Term. The term of this Public License is specified in Section
153 |           6(a).
154 | 
155 |        4. Media and formats; technical modifications allowed. The
156 |           Licensor authorizes You to exercise the Licensed Rights in
157 |           all media and formats whether now known or hereafter created,
158 |           and to make technical modifications necessary to do so. The
159 |           Licensor waives and/or agrees not to assert any right or
160 |           authority to forbid You from making technical modifications
161 |           necessary to exercise the Licensed Rights, including
162 |           technical modifications necessary to circumvent Effective
163 |           Technological Measures. For purposes of this Public License,
164 |           simply making modifications authorized by this Section 2(a)
165 |           (4) never produces Adapted Material.
166 | 
167 |        5. Downstream recipients.
168 | 
169 |             a. Offer from the Licensor -- Licensed Material. Every
170 |                recipient of the Licensed Material automatically
171 |                receives an offer from the Licensor to exercise the
172 |                Licensed Rights under the terms and conditions of this
173 |                Public License.
174 | 
175 |             b. No downstream restrictions. You may not offer or impose
176 |                any additional or different terms or conditions on, or
177 |                apply any Effective Technological Measures to, the
178 |                Licensed Material if doing so restricts exercise of the
179 |                Licensed Rights by any recipient of the Licensed
180 |                Material.
181 | 
182 |        6. No endorsement. Nothing in this Public License constitutes or
183 |           may be construed as permission to assert or imply that You
184 |           are, or that Your use of the Licensed Material is, connected
185 |           with, or sponsored, endorsed, or granted official status by,
186 |           the Licensor or others designated to receive attribution as
187 |           provided in Section 3(a)(1)(A)(i).
188 | 
189 |   b. Other rights.
190 | 
191 |        1. Moral rights, such as the right of integrity, are not
192 |           licensed under this Public License, nor are publicity,
193 |           privacy, and/or other similar personality rights; however, to
194 |           the extent possible, the Licensor waives and/or agrees not to
195 |           assert any such rights held by the Licensor to the limited
196 |           extent necessary to allow You to exercise the Licensed
197 |           Rights, but not otherwise.
198 | 
199 |        2. Patent and trademark rights are not licensed under this
200 |           Public License.
201 | 
202 |        3. To the extent possible, the Licensor waives any right to
203 |           collect royalties from You for the exercise of the Licensed
204 |           Rights, whether directly or through a collecting society
205 |           under any voluntary or waivable statutory or compulsory
206 |           licensing scheme. In all other cases the Licensor expressly
207 |           reserves any right to collect such royalties.
208 | 
209 | 
210 | Section 3 -- License Conditions.
211 | 
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 | 
215 |   a. Attribution.
216 | 
217 |        1. If You Share the Licensed Material (including in modified
218 |           form), You must:
219 | 
220 |             a. retain the following if it is supplied by the Licensor
221 |                with the Licensed Material:
222 | 
223 |                  i. identification of the creator(s) of the Licensed
224 |                     Material and any others designated to receive
225 |                     attribution, in any reasonable manner requested by
226 |                     the Licensor (including by pseudonym if
227 |                     designated);
228 | 
229 |                 ii. a copyright notice;
230 | 
231 |                iii. a notice that refers to this Public License;
232 | 
233 |                 iv. a notice that refers to the disclaimer of
234 |                     warranties;
235 | 
236 |                  v. a URI or hyperlink to the Licensed Material to the
237 |                     extent reasonably practicable;
238 | 
239 |             b. indicate if You modified the Licensed Material and
240 |                retain an indication of any previous modifications; and
241 | 
242 |             c. indicate the Licensed Material is licensed under this
243 |                Public License, and include the text of, or the URI or
244 |                hyperlink to, this Public License.
245 | 
246 |        2. You may satisfy the conditions in Section 3(a)(1) in any
247 |           reasonable manner based on the medium, means, and context in
248 |           which You Share the Licensed Material. For example, it may be
249 |           reasonable to satisfy the conditions by providing a URI or
250 |           hyperlink to a resource that includes the required
251 |           information.
252 | 
253 |        3. If requested by the Licensor, You must remove any of the
254 |           information required by Section 3(a)(1)(A) to the extent
255 |           reasonably practicable.
256 | 
257 |        4. If You Share Adapted Material You produce, the Adapter's
258 |           License You apply must not prevent recipients of the Adapted
259 |           Material from complying with this Public License.
260 | 
261 | 
262 | Section 4 -- Sui Generis Database Rights.
263 | 
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 | 
267 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 |      to extract, reuse, reproduce, and Share all or a substantial
269 |      portion of the contents of the database;
270 | 
271 |   b. if You include all or a substantial portion of the database
272 |      contents in a database in which You have Sui Generis Database
273 |      Rights, then the database in which You have Sui Generis Database
274 |      Rights (but not its individual contents) is Adapted Material; and
275 | 
276 |   c. You must comply with the conditions in Section 3(a) if You Share
277 |      all or a substantial portion of the contents of the database.
278 | 
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 | 
283 | 
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 | 
286 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 | 
297 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 | 
307 |   c. The disclaimer of warranties and limitation of liability provided
308 |      above shall be interpreted in a manner that, to the extent
309 |      possible, most closely approximates an absolute disclaimer and
310 |      waiver of all liability.
311 | 
312 | 
313 | Section 6 -- Term and Termination.
314 | 
315 |   a. This Public License applies for the term of the Copyright and
316 |      Similar Rights licensed here. However, if You fail to comply with
317 |      this Public License, then Your rights under this Public License
318 |      terminate automatically.
319 | 
320 |   b. Where Your right to use the Licensed Material has terminated under
321 |      Section 6(a), it reinstates:
322 | 
323 |        1. automatically as of the date the violation is cured, provided
324 |           it is cured within 30 days of Your discovery of the
325 |           violation; or
326 | 
327 |        2. upon express reinstatement by the Licensor.
328 | 
329 |      For the avoidance of doubt, this Section 6(b) does not affect any
330 |      right the Licensor may have to seek remedies for Your violations
331 |      of this Public License.
332 | 
333 |   c. For the avoidance of doubt, the Licensor may also offer the
334 |      Licensed Material under separate terms or conditions or stop
335 |      distributing the Licensed Material at any time; however, doing so
336 |      will not terminate this Public License.
337 | 
338 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 |      License.
340 | 
341 | 
342 | Section 7 -- Other Terms and Conditions.
343 | 
344 |   a. The Licensor shall not be bound by any additional or different
345 |      terms or conditions communicated by You unless expressly agreed.
346 | 
347 |   b. Any arrangements, understandings, or agreements regarding the
348 |      Licensed Material not stated herein are separate from and
349 |      independent of the terms and conditions of this Public License.
350 | 
351 | 
352 | Section 8 -- Interpretation.
353 | 
354 |   a. For the avoidance of doubt, this Public License does not, and
355 |      shall not be interpreted to, reduce, limit, restrict, or impose
356 |      conditions on any use of the Licensed Material that could lawfully
357 |      be made without permission under this Public License.
358 | 
359 |   b. To the extent possible, if any provision of this Public License is
360 |      deemed unenforceable, it shall be automatically reformed to the
361 |      minimum extent necessary to make it enforceable. If the provision
362 |      cannot be reformed, it shall be severed from this Public License
363 |      without affecting the enforceability of the remaining terms and
364 |      conditions.
365 | 
366 |   c. No term or condition of this Public License will be waived and no
367 |      failure to comply consented to unless expressly agreed to by the
368 |      Licensor.
369 | 
370 |   d. Nothing in this Public License constitutes or may be interpreted
371 |      as a limitation upon, or waiver of, any privileges and immunities
372 |      that apply to the Licensor or You, including from the legal
373 |      processes of any jurisdiction or authority.
374 | 
375 | 
376 | =======================================================================
377 | 
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 | 
395 | Creative Commons may be contacted at creativecommons.org.


--------------------------------------------------------------------------------
/LICENSE-CODE:
--------------------------------------------------------------------------------
 1 |     MIT License
 2 | 
 3 |     Copyright (c) Microsoft Corporation.
 4 | 
 5 |     Permission is hereby granted, free of charge, to any person obtaining a copy
 6 |     of this software and associated documentation files (the "Software"), to deal
 7 |     in the Software without restriction, including without limitation the rights
 8 |     to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 |     copies of the Software, and to permit persons to whom the Software is
10 |     furnished to do so, subject to the following conditions:
11 | 
12 |     The above copyright notice and this permission notice shall be included in all
13 |     copies or substantial portions of the Software.
14 | 
15 |     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 |     IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 |     FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 |     AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 |     LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 |     OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 |     SOFTWARE
22 | 


--------------------------------------------------------------------------------
/README-DNS3.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Deep Noise Suppression (DNS) Challenge 3 - INTERSPEECH 2021
  3 | 
  4 | **NOTE:** This README describes the **PAST** DNS Challenge!
  5 | 
  6 | The data for it is still available, and is described below. If you are interested in the latest DNS
  7 | Challenge, please refer to the main [README.md](README.md) file.
  8 | 
  9 | ## In this repository
 10 | 
 11 | This repository contains the datasets and scripts required for INTERSPEECH 2021 DNS Challenge, AKA
 12 | DNS Challenge 3, or DNS3. For more details about the challenge, please see our
 13 | [paper](https://arxiv.org/pdf/2101.01902.pdf) and the challenge
 14 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/).
 15 | For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808).
 16 | 
 17 | ## Details
 18 | 
 19 | * The **datasets** directory is a placeholder for the wideband datasets. That is, our data
 20 |   downloader script by default will place the downloader audio data here. After the download, this
 21 |   directory will contain clean speech, noise, and room impulse responses required for creating the
 22 |   training data for wideband scenario. The script will also download here the test set that
 23 |   participants can use during the development stages.
 24 | * The **datasets_fullband** directory is a placeholder for the fullband audio data. The downloader
 25 |   script will download here the datasets that contain clean speech and noise audio clips required
 26 |   for creating training data for fullband scenario.
 27 | * The **NSNet2-baseline** directory contains the inference scripts and the ONNX model for the
 28 |   baseline Speech Enhancement method for wideband. 
 29 | * **download-dns-challenge-3.sh** - this is the script to download the data. By default, the data
 30 |   will be placed into `datasets/` and `datasets_fullband/` directories. Please take a look at the
 31 |   script and uncomment the perferred download method. Unmodified, the script performs a dry
 32 |   run and retrieves only the HTTP headers for each archive.
 33 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for
 34 |   training purposes.
 35 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are
 36 |   required to accurately specify different parameters and provide the right paths to the datasets
 37 |   required to synthesize noisy speech.
 38 | * **audiolib.py** - contains modules required to synthesize datasets.
 39 | * **utils.py** - contains some utility functions required to synthesize the data.
 40 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data.
 41 | * **requirements.txt** - contains all the libraries required for synthesizing the data.
 42 | 
 43 | ## Datasets
 44 | 
 45 | The default directory structure and the sizes of the datasets available for DNS Challenge are:
 46 | 
 47 | ```
 48 | datasets 229G
 49 | ├── clean 204G
 50 | │   ├── emotional_speech 403M
 51 | │   ├── french_data 21G
 52 | │   ├── german_speech 66G
 53 | │   ├── italian_speech 14G
 54 | │   ├── mandarin_speech 21G
 55 | │   ├── read_speech 61G
 56 | │   ├── russian_speech 5.1G
 57 | │   ├── singing_voice 979M
 58 | │   └── spanish_speech 17G
 59 | ├── dev_testset 211M
 60 | ├── impulse_responses 4.3G
 61 | │   ├── SLR26 2.1G
 62 | │   └── SLR28 2.3G
 63 | └── noise 20G
 64 | ```
 65 | 
 66 | And, for the fullband data,
 67 | ```
 68 | datasets_fullband 600G
 69 | ├── clean_fullband 542G
 70 | │   ├── VocalSet_48kHz_mono 974M
 71 | │   ├── emotional_speech 1.2G
 72 | │   ├── french_data 62G
 73 | │   ├── german_speech 194G
 74 | │   ├── italian_speech 42G
 75 | │   ├── read_speech 182G
 76 | │   ├── russian_speech 12G
 77 | │   └── spanish_speech 50G
 78 | ├── dev_testset_fullband 630M
 79 | └── noise_fullband 58G
 80 | ```
 81 | 
 82 | ## Code prerequisites
 83 | - Python 3.6 and above
 84 | - Python libraries: soundfile, librosa
 85 | 
 86 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the
 87 | `download-dns-challenge-3.sh` script in this repo to download the data.
 88 | 
 89 | ## Usage:
 90 | 
 91 | 1. Install Python libraries
 92 | ```bash
 93 | pip3 install soundfile librosa
 94 | ```
 95 | 2. Clone the repository. 
 96 | ```bash
 97 | git clone https://github.com/microsoft/DNS-Challenge
 98 | ```
 99 | 
100 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and
101 |    include the paths to clean speech, noise and impulse response related csv files. Also, specify
102 |    the paths to the destination directories and store the logs.
103 | 
104 | 4. Create dataset 
105 | ```bash
106 | python3 noisyspeech_synthesizer_singleprocess.py
107 | ```
108 | 
109 | ## Citation:
110 | If you use this dataset in a publication please cite the following paper:<br />  
111 | 
112 | ```BibTex
113 | @inproceedings{reddy2021interspeech,
114 |   title={INTERSPEECH 2021 Deep Noise Suppression Challenge},
115 |   author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
116 |   booktitle={INTERSPEECH},
117 |   year={2021}
118 | }
119 | ```
120 | 
121 | The baseline NSNet noise suppression:<br />  
122 | ```BibTex
123 | @inproceedings{9054254, 
124 |     author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}}, 
125 |     booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, 
126 |     Speech and Signal Processing (ICASSP)}, 
127 |     title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement}, 
128 |     year={2020}, volume={}, number={}, pages={871-875},}
129 | ```
130 | 
131 | ```BibTex
132 | @misc{braun2020data,
133 |     title={Data augmentation and loss normalization for deep noise suppression},
134 |     author={Sebastian Braun and Ivan Tashev},
135 |     year={2020},
136 |     eprint={2008.06412},
137 |     archivePrefix={arXiv},
138 |     primaryClass={eess.AS}
139 | }
140 | ```
141 | 
142 | The P.835 test framework:<br />
143 | ```BibTex
144 | @inproceedings{naderi2021crowdsourcing,
145 |   title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing},
146 |   author={Naderi, Babak and Cutler, Ross},
147 |   booktitle={INTERSPEECH},
148 |   year={2021}
149 | }
150 | ```
151 | 
152 | DNSMOS API: <br />
153 | ```BibTex
154 | @inproceedings{reddy2020dnsmos,
155 |   title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors},
156 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
157 |   booktitle={ICASSP},
158 |   year={2020}
159 | }
160 | ```
161 | 
162 | # Contributing
163 | 
164 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
165 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
166 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
167 | 
168 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a
169 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
170 | provided by the bot. You will only need to do this once across all repos using our CLA.
171 | 
172 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
173 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
174 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
175 | 
176 | # Legal Notices
177 | 
178 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content
179 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
180 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
181 | [LICENSE-CODE](LICENSE-CODE) file.
182 | 
183 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the
184 | documentation may be either trademarks or registered trademarks of Microsoft in the United States
185 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft
186 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at
187 | http://go.microsoft.com/fwlink/?LinkID=254653.
188 | 
189 | Privacy information can be found at https://privacy.microsoft.com/en-us/
190 | 
191 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
192 | or trademarks, whether by implication, estoppel or otherwise.
193 | 
194 | 
195 | ## Dataset licenses
196 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
197 | 
198 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.
199 | 
200 | The datasets used in this project are licensed as follows:
201 | 1. Clean speech: 
202 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/
203 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/ 
204 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y 
205 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International
206 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)
207 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/
208 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
209 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here. 
210 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0.
211 | http://opendatacommons.org/licenses/by/1.0/ 
212 | 
213 | 2. Noise:
214 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/
215 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/
216 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA
217 | 
218 | 3. RIR datasets: OpenSLR26 and OpenSLR28:
219 | * http://www.openslr.org/26/
220 | * http://www.openslr.org/28/
221 | * License: Apache 2.0
222 | 
223 | ## Code license
224 | MIT License
225 | 
226 | Copyright (c) Microsoft Corporation.
227 | 
228 | Permission is hereby granted, free of charge, to any person obtaining a copy
229 | of this software and associated documentation files (the "Software"), to deal
230 | in the Software without restriction, including without limitation the rights
231 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
232 | copies of the Software, and to permit persons to whom the Software is
233 | furnished to do so, subject to the following conditions:
234 | 
235 | The above copyright notice and this permission notice shall be included in all
236 | copies or substantial portions of the Software.
237 | 
238 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
239 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
240 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
241 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
242 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
243 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
244 | SOFTWARE
245 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ICASSP 2023 Deep Noise Suppression Challenge
  2 | Website: https://aka.ms/dns-challenge
  3 | Git Repo: https://github.com/microsoft/DNS-Challenge
  4 | Challenge Paper: <TBD> 
  5 | 
  6 | ## Important features of this challenge
  7 | 1. Along with noise suppression, it includes de-reverberation and suppression of interfering talkers for headset and speakerphone scenarios.
  8 | 2. The challenge has two tracks: (i) Headset (wired/wireless headphone, earbuds such as airpods etc.) speech enhancement; (ii) Non-headset (speakerphone, built-in mic in laptop/desktop/mobile phone/other meeting devices etc.) speech enhancement. 
  9 | 3. This challenge adopts the ITU-T P.835 subjective test framework to measure speech quality (SIG), background noise quality (BAK), and overall audio quality (OVRL). We modified the ITU-T P.835 to make it reliable for test clips with interfering (undesired neighboring) talkers. Along with P.835 scores, Word Accuracy (WAcc) is used to measure the performance of models.
 10 | 4. Please NOTE that the intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).
 11 | 5. There are new requirements for model related latency. Please check all requirements listed at https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/
 12 | 
 13 | ## Baseline Speaker Embeddings
 14 | This challenge adopted pretrained ECAPA-TDNN model available in SpeechBrain as baseline speaker embeddings models, available at https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb. Participants can use any other publically available speaker embeddings model or develop their own speaker embedding extractor. Participants are encourage to explore RawNet3 models available at https://github.com/jungjee/RawNet
 15 | 
 16 | Previous DNS Challenge used RawNet2 speaker embeddings. So far, impact of different speaker embeddings for personalized speech enhancements is not studied in sufficient depth. 
 17 | 
 18 | # Install SpeechBrain with below command:
 19 | pip install speechbrain
 20 | 
 21 | #Compute Speaker Embeddings for your wav file with below command:
 22 | 
 23 | import torchaudio
 24 | from speechbrain.pretrained import EncoderClassifier
 25 | classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb")
 26 | signal, fs =torchaudio.load('tests/samples/ASR/spk1_snt1.wav')
 27 | embeddings = classifier.encode_batch(signal)
 28 | 
 29 | ## In this repository
 30 | 
 31 | This repository contains the datasets and scripts required for 5th DNS Challenge at ICASSP 2023, aka DNS
 32 | Challenge 5, or simply **DNS5**. For more details about the challenge, please see our
 33 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/) and [paper](docs/ICASSP2023_5th_DNS_Challenge.pdf). For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808).
 34 | 
 35 | ## Details
 36 | 
 37 | * The **datasets_fullband** folder is a placeholder for the datasets. That is, our data downloader
 38 |   script by default will place the downloaded audio data there. After the download, it will contain
 39 |   clean speech, noise, and room impulse responses required for creating the training data.
 40 |   
 41 | * The **Baseline** directory contains the enhanced clips from dev testset for both tracks. 
 42 | 
 43 | * **download-dns-challenge-5-headset-training.sh** - this is the script to download the data for headset (Track 1). By default, the data will be placed into the `./datasets_fullband/` folder. Please take a look at the script and **uncomment** the perferred download method._ Unmodified, the script performs a dry run and retrieves only the HTTP headers for each archive.
 44 | 
 45 | * **download-dns-challenge-5-speakerphone-training.sh** - this is the script to download the data for speakerphone (Track 2).
 46 | 
 47 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for
 48 |   training purposes.
 49 | 
 50 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are
 51 | required to accurately specify different parameters and provide the right paths to the datasets required to synthesize noisy speech.
 52 | 
 53 | * **audiolib.py** - contains modules required to synthesize datasets.
 54 | * **utils.py** - contains some utility functions required to synthesize the data.
 55 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data.
 56 | * **requirements.txt** - contains all the libraries required for synthesizing the data.
 57 | 
 58 | ## Datasets
 59 | **V5_dev_testset**: directory containing dev testsets for both tracks. Each testclip has 10s duration and the corresponding enrollment clips with 30s duration. 
 60 | 
 61 | **BLIND testset**: <TBD>
 62 | 
 63 | ## WAcc script
 64 | https://github.com/microsoft/DNS-Challenge/tree/master/WAcc
 65 | 
 66 | ## Wacc ground-truth transcript
 67 | Dev testset: available only for speakerphone track, see v5_dev_testset directory. For headset track, we are providing ASR output and list of prompts read during recording of testclips. Participants can help in correcting ASR output to generate the ground-truth transcripts.
 68 | Blind testset: <TBD>
 69 | 
 70 | ### Data info
 71 | 
 72 | The default directory structure and the sizes of the datasets of the 5th DNS
 73 | Challenge are:
 74 | 
 75 | ```
 76 | datasets_fullband 
 77 | +-- dev_testset 
 78 | +-- impulse_responses 5.9G
 79 | +-- noise_fullband 58G
 80 | \-- clean_fullband 827G
 81 |     +-- emotional_speech 2.4G
 82 |     +-- french_speech 62G
 83 |     +-- german_speech 319G
 84 |     +-- italian_speech 42G
 85 |     +-- read_speech 299G
 86 |     +-- russian_speech 12G
 87 |     +-- spanish_speech 65G
 88 |     +-- vctk_wav48_silence_trimmed 27G
 89 |     \-- VocalSet_48kHz_mono 974M
 90 | ```
 91 | 
 92 | In all, you will need about 1TB to store the _unpacked_ data. Archived, the same data takes about
 93 | 550GB total.
 94 | 
 95 | ### Headset DNS track
 96 | ### Data checksums
 97 | 
 98 | A CSV file containing file sizes and SHA1 checksums for audio clips in both Real-time *and*
 99 | Personalized DNS datasets is available at:
100 | [dns5-datasets-files-sha1.csv.bz2](https://dns4public.blob.core.windows.net/dns4archive/dns5-datasets-files-sha1.csv.bz2).
101 | The archive is 41.3MB in size and can be read in Python like this:
102 | ```python
103 | import pandas as pd
104 | 
105 | sha1sums = pd.read_csv("dns5-datasets-files-sha1.csv.bz2", names=["size", "sha1", "path"])
106 | ```
107 | 
108 | ## Code prerequisites
109 | - Python 3.6 and above
110 | - Python libraries: soundfile, librosa
111 | 
112 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the
113 | `download-dns-challenge-5*.sh` scripts in this repo to download the data.
114 | 
115 | ## Usage:
116 | 
117 | 1. Install Python libraries
118 | ```bash
119 | pip3 install soundfile librosa
120 | ```
121 | 2. Clone the repository.
122 | ```bash
123 | git clone https://github.com/microsoft/DNS-Challenge
124 | ```
125 | 
126 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and
127 |    include the paths to clean speech, noise and impulse response related csv files. Also, specify
128 |    the paths to the destination directories and store the logs.
129 | 
130 | 4. Create dataset
131 | ```bash
132 | python3 noisyspeech_synthesizer_singleprocess.py
133 | ```
134 | 
135 | ## Citation:
136 | If you use this dataset in a publication please cite the following paper:<br />
137 | 
138 | ```BibTex
139 | @inproceedings{dubey2023icassp,
140 |   title={ICASSP 2023 Deep Noise Suppression Challenge},
141 |   author={
142 |  Dubey, Harishchandra and Aazami, Ashkan and Gopal, Vishak and Naderi, Babak and Braun, Sebastian and  Cutler, Ross and Gamper, Hannes and Golestaneh, Mehrsa and Aichner, Robert},
143 |   booktitle={ICASSP},
144 |   year={2023}
145 | }
146 | ```
147 | 
148 | The previous challenges were: 
149 | ```BibTex
150 | @inproceedings{dubey2022icassp,
151 |   title={ICASSP 2022 Deep Noise Suppression Challenge},
152 |   author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Emre Sefik and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert},
153 |   booktitle={ICASSP},
154 |   year={2022}
155 | }
156 | 
157 | @inproceedings{reddy2021interspeech,
158 |   title={INTERSPEECH 2021 Deep Noise Suppression Challenge},
159 |   author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
160 |   booktitle={INTERSPEECH},
161 |   year={2021}
162 | }
163 | ```
164 | ```BibTex
165 | @inproceedings{reddy2021icassp,
166 |   title={ICASSP 2021 deep noise suppression challenge},
167 |   author={Reddy, Chandan KA and Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
168 |   booktitle={ICASSP},
169 |   year={2021},
170 | }
171 | ```
172 | ```BibTex
173 | @inproceedings{reddy2020interspeech,
174 |   title={The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results},
175 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross and Beyrami, Ebrahim and Cheng, Roger and Dubey, Harishchandra and Matusevych, Sergiy and Aichner, Robert and Aazami, Ashkan and Braun, Sebastian and others},
176 |   booktitle={INTERSPEECH},
177 |   year={2020}
178 | }
179 | ```
180 | 
181 | The baseline NSNet noise suppression:<br />
182 | ```BibTex
183 | @inproceedings{9054254,
184 |     author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}},
185 |     booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
186 |     Speech and Signal Processing (ICASSP)},
187 |     title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement},
188 |     year={2020}, volume={}, number={}, pages={871-875},}
189 | ```
190 | 
191 | ```BibTex
192 | @misc{braun2020data,
193 |     title={Data augmentation and loss normalization for deep noise suppression},
194 |     author={Sebastian Braun and Ivan Tashev},
195 |     year={2020},
196 |     eprint={2008.06412},
197 |     archivePrefix={arXiv},
198 |     primaryClass={eess.AS}
199 | }
200 | ```
201 | 
202 | The P.835 test framework:<br />
203 | ```BibTex
204 | @inproceedings{naderi2021crowdsourcing,
205 |   title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing},
206 |   author={Naderi, Babak and Cutler, Ross},
207 |   booktitle={INTERSPEECH},
208 |   year={2021}
209 | }
210 | ```
211 | 
212 | DNSMOS API: <br />
213 | ```BibTex
214 | @inproceedings{reddy2021dnsmos,
215 |   title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors},
216 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
217 |   booktitle={ICASSP},
218 |   year={2021}
219 | }
220 | ```
221 | 
222 | ```BibTex
223 | @inproceedings{reddy2022dnsmos,
224 |   title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
225 |   author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
226 |   booktitle={ICASSP},
227 |   year={2022}
228 | }
229 | ```
230 | 
231 | # Contributing
232 | 
233 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
234 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
235 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
236 | 
237 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a
238 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
239 | provided by the bot. You will only need to do this once across all repos using our CLA.
240 | 
241 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
242 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
243 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
244 | 
245 | # Legal Notices
246 | 
247 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content
248 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
249 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
250 | [LICENSE-CODE](LICENSE-CODE) file.
251 | 
252 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the
253 | documentation may be either trademarks or registered trademarks of Microsoft in the United States
254 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft
255 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at
256 | http://go.microsoft.com/fwlink/?LinkID=254653.
257 | 
258 | Privacy information can be found at https://privacy.microsoft.com/en-us/
259 | 
260 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
261 | or trademarks, whether by implication, estoppel or otherwise.
262 | 
263 | 
264 | ## Dataset licenses
265 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
266 | 
267 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.
268 | 
269 | The datasets used in this project are licensed as follows:
270 | 1. Clean speech:
271 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/
272 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/
273 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y
274 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International
275 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)
276 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/
277 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
278 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.
279 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0.
280 | http://opendatacommons.org/licenses/by/1.0/
281 | 
282 | 2. Noise:
283 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/
284 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/
285 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA
286 | 
287 | 3. RIR datasets: OpenSLR26 and OpenSLR28:
288 | * http://www.openslr.org/26/
289 | * http://www.openslr.org/28/
290 | * License: Apache 2.0
291 | 
292 | ## Code license
293 | MIT License
294 | 
295 | Copyright (c) Microsoft Corporation.
296 | 
297 | Permission is hereby granted, free of charge, to any person obtaining a copy
298 | of this software and associated documentation files (the "Software"), to deal
299 | in the Software without restriction, including without limitation the rights
300 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
301 | copies of the Software, and to permit persons to whom the Software is
302 | furnished to do so, subject to the following conditions:
303 | 
304 | The above copyright notice and this permission notice shall be included in all
305 | copies or substantial portions of the Software.
306 | 
307 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
308 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
309 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
310 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
311 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
312 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
313 | SOFTWARE
314 | 


--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
 1 | <!-- BEGIN MICROSOFT SECURITY.MD V0.0.3 BLOCK -->
 2 | 
 3 | ## Security
 4 | 
 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
 6 | 
 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below.
 8 | 
 9 | ## Reporting Security Issues
10 | 
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 | 
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
14 | 
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
16 | 
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
18 | 
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 | 
21 |   * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 |   * Full paths of source file(s) related to the manifestation of the issue
23 |   * The location of the affected source code (tag/branch/commit or direct URL)
24 |   * Any special configuration required to reproduce the issue
25 |   * Step-by-step instructions to reproduce the issue
26 |   * Proof-of-concept or exploit code (if possible)
27 |   * Impact of the issue, including how an attacker might exploit the issue
28 | 
29 | This information will help us triage your report more quickly.
30 | 
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
32 | 
33 | ## Preferred Languages
34 | 
35 | We prefer all communications to be in English.
36 | 
37 | ## Policy
38 | 
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
40 | 
41 | <!-- END MICROSOFT SECURITY.MD BLOCK -->
42 | 


--------------------------------------------------------------------------------
/V5_DNS_Challenge_FinalResults.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/V5_DNS_Challenge_FinalResults.pdf


--------------------------------------------------------------------------------
/WAcc/WAcc.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import glob
 3 | import os
 4 | 
 5 | import librosa
 6 | import numpy as np
 7 | import pandas
 8 | import pandas as pd
 9 | import requests
10 | import soundfile as sf
11 | 
12 | WACC_SERVICE_URL = 'https://wacc.azurewebsites.net/api/TriggerEvaluation?code=K2XN7ouruRN/2k1HNyS79ET39rEMZ9jOOCnFtodPDj42WJFjG9LWXg=='
13 | SUPPORTED_SAMPLING_RATE = 16000
14 | TRANSCRIPTIONS_FILE = 'DNSChallenge4_devtest.tsv'
15 | 
16 | def main(args):
17 |     audio_clips_list = glob.glob(os.path.join(args.testset_dir, "*.wav"))
18 |     transcriptions_df = pd.read_csv(TRANSCRIPTIONS_FILE, sep="\t")
19 |     scores = []
20 |     for fpath in audio_clips_list:
21 |         if os.path.basename(fpath) not in transcriptions_df['filename'].unique():
22 |             continue
23 |         original_audio, fs = sf.read(fpath)
24 |         if fs != SUPPORTED_SAMPLING_RATE:
25 |             print('Only sampling rate of 16000 is supported as of now so resampling audio')
26 |             audio = librosa.core.resample(original_audio, fs, SUPPORTED_SAMPLING_RATE)
27 |             sf.write(fpath, audio, SUPPORTED_SAMPLING_RATE)
28 | 
29 |         try:
30 |             with open(fpath, 'rb') as f:
31 |                 resp = requests.post(WACC_SERVICE_URL, files={'audiodata':f})
32 |             wacc = resp.json()
33 |         except:
34 |             print('Error occured during scoring')
35 |             print('response is ', resp)
36 |         sf.write(fpath, original_audio, fs)
37 |         score_dict = {'file_name': os.path.basename(fpath), 'wacc': wacc}
38 |         scores.append(score_dict)
39 | 
40 |     df = pd.DataFrame(scores)
41 |     print('Mean WAcc for the files is ', np.mean(df['wacc']))
42 | 
43 |     if args.score_file:
44 |         df.to_csv(args.score_file)
45 | 
46 | if __name__=="__main__":
47 |     parser = argparse.ArgumentParser()
48 |     parser.add_argument("--testset_dir", required=True, 
49 |                         help='Path to the dir containing audio clips to be evaluated')
50 |     parser.add_argument('--score_file', help='If you want the scores in a CSV file provide the full path')
51 | 
52 |     args = parser.parse_args()
53 |     main(args)
54 | 


--------------------------------------------------------------------------------
/audiolib.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | @author: chkarada
  4 | """
  5 | import os
  6 | import numpy as np
  7 | import soundfile as sf
  8 | import subprocess
  9 | import glob
 10 | import librosa
 11 | import random
 12 | import tempfile
 13 | 
 14 | EPS = np.finfo(float).eps
 15 | np.random.seed(0)
 16 | 
 17 | def is_clipped(audio, clipping_threshold=0.99):
 18 |     return any(abs(audio) > clipping_threshold)
 19 | 
 20 | def normalize(audio, target_level=-25):
 21 |     '''Normalize the signal to the target level'''
 22 |     rms = (audio ** 2).mean() ** 0.5
 23 |     scalar = 10 ** (target_level / 20) / (rms+EPS)
 24 |     audio = audio * scalar
 25 |     return audio
 26 | 
 27 | def normalize_segmental_rms(audio, rms, target_level=-25):
 28 |     '''Normalize the signal to the target level
 29 |     based on segmental RMS'''
 30 |     scalar = 10 ** (target_level / 20) / (rms+EPS)
 31 |     audio = audio * scalar
 32 |     return audio
 33 | 
 34 | def audioread(path, norm=False, start=0, stop=None, target_level=-25):
 35 |     '''Function to read audio'''
 36 | 
 37 |     path = os.path.abspath(path)
 38 |     if not os.path.exists(path):
 39 |         raise ValueError("[{}] does not exist!".format(path))
 40 |     try:
 41 |         audio, sample_rate = sf.read(path, start=start, stop=stop)
 42 |     except RuntimeError:  # fix for sph pcm-embedded shortened v2
 43 |         print('WARNING: Audio type not supported')
 44 |         return (None, None)
 45 | 
 46 |     if len(audio.shape) == 1:  # mono
 47 |         if norm:
 48 |             rms = (audio ** 2).mean() ** 0.5
 49 |             scalar = 10 ** (target_level / 20) / (rms+EPS)
 50 |             audio = audio * scalar
 51 |     else:  # multi-channel
 52 |         audio = audio.T
 53 |         audio = audio.sum(axis=0)/audio.shape[0]
 54 |         if norm:
 55 |             audio = normalize(audio, target_level)
 56 | 
 57 |     return audio, sample_rate
 58 | 
 59 | 
 60 | def audiowrite(destpath, audio, sample_rate=16000, norm=False, target_level=-25, \
 61 |                 clipping_threshold=0.99, clip_test=False):
 62 |     '''Function to write audio'''
 63 | 
 64 |     if clip_test:
 65 |         if is_clipped(audio, clipping_threshold=clipping_threshold):
 66 |             raise ValueError("Clipping detected in audiowrite()! " + \
 67 |                             destpath + " file not written to disk.")
 68 | 
 69 |     if norm:
 70 |         audio = normalize(audio, target_level)
 71 |         max_amp = max(abs(audio))
 72 |         if max_amp >= clipping_threshold:
 73 |             audio = audio/max_amp * (clipping_threshold-EPS)
 74 | 
 75 |     destpath = os.path.abspath(destpath)
 76 |     destdir = os.path.dirname(destpath)
 77 | 
 78 |     if not os.path.exists(destdir):
 79 |         os.makedirs(destdir)
 80 | 
 81 |     sf.write(destpath, audio, sample_rate)
 82 |     return
 83 | 
 84 | 
 85 | def add_reverb(sasxExe, input_wav, filter_file, output_wav):
 86 |     ''' Function to add reverb'''
 87 |     command_sasx_apply_reverb = "{0} -r {1} \
 88 |         -f {2} -o {3}".format(sasxExe, input_wav, filter_file, output_wav)
 89 |                                                                
 90 |     subprocess.call(command_sasx_apply_reverb)
 91 |     return output_wav
 92 | 
 93 | 
 94 | def add_clipping(audio, max_thresh_perc=0.8):
 95 |     '''Function to add clipping'''
 96 |     threshold = max(abs(audio))*max_thresh_perc
 97 |     audioclipped = np.clip(audio, -threshold, threshold)
 98 |     return audioclipped
 99 | 
100 | 
101 | def adsp_filter(Adspvqe, nearEndInput, nearEndOutput, farEndInput):
102 | 
103 |     command_adsp_clean = "{0} --breakOnErrors 0 --sampleRate 16000 --useEchoCancellation 0 \
104 |                     --operatingMode 2 --useDigitalAgcNearend 0 --useDigitalAgcFarend 0 \
105 |                     --useVirtualAGC 0 --useComfortNoiseGenerator 0 --useAnalogAutomaticGainControl 0 \
106 |                     --useNoiseReduction 0 --loopbackInputFile {1} --farEndInputFile {2} \
107 |                     --nearEndInputFile {3} --nearEndOutputFile {4}".format(Adspvqe,
108 |                                 farEndInput, farEndInput, nearEndInput, nearEndOutput)
109 |     subprocess.call(command_adsp_clean)
110 | 
111 | 
112 | def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
113 |     '''Function to mix clean speech and noise at various SNR levels'''
114 |     cfg = params['cfg']
115 |     if len(clean) > len(noise):
116 |         noise = np.append(noise, np.zeros(len(clean)-len(noise)))
117 |     else:
118 |         clean = np.append(clean, np.zeros(len(noise)-len(clean)))
119 | 
120 |     # Normalizing to -25 dB FS
121 |     clean = clean/(max(abs(clean))+EPS)
122 |     clean = normalize(clean, target_level)
123 |     rmsclean = (clean**2).mean()**0.5
124 | 
125 |     noise = noise/(max(abs(noise))+EPS)
126 |     noise = normalize(noise, target_level)
127 |     rmsnoise = (noise**2).mean()**0.5
128 | 
129 |     # Set the noise level for a given SNR
130 |     noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS)
131 |     noisenewlevel = noise * noisescalar
132 | 
133 |     # Mix noise and clean speech
134 |     noisyspeech = clean + noisenewlevel
135 |     
136 |     # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value
137 |     # There is a chance of clipping that might happen with very less probability, which is not a major issue. 
138 |     noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper'])
139 |     rmsnoisy = (noisyspeech**2).mean()**0.5
140 |     scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS)
141 |     noisyspeech = noisyspeech * scalarnoisy
142 |     clean = clean * scalarnoisy
143 |     noisenewlevel = noisenewlevel * scalarnoisy
144 | 
145 |     # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly
146 |     if is_clipped(noisyspeech):
147 |         noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS)
148 |         noisyspeech = noisyspeech/noisyspeech_maxamplevel
149 |         clean = clean/noisyspeech_maxamplevel
150 |         noisenewlevel = noisenewlevel/noisyspeech_maxamplevel
151 |         noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS)))
152 | 
153 |     return clean, noisenewlevel, noisyspeech, noisy_rms_level
154 | 
155 | 
156 | def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
157 |     '''Function to mix clean speech and noise at various segmental SNR levels'''
158 |     cfg = params['cfg']
159 |     if len(clean) > len(noise):
160 |         noise = np.append(noise, np.zeros(len(clean)-len(noise)))
161 |     else:
162 |         clean = np.append(clean, np.zeros(len(noise)-len(clean)))
163 |     clean = clean/(max(abs(clean))+EPS)
164 |     noise = noise/(max(abs(noise))+EPS)
165 |     rmsclean, rmsnoise = active_rms(clean=clean, noise=noise)
166 |     clean = normalize_segmental_rms(clean, rms=rmsclean, target_level=target_level)
167 |     noise = normalize_segmental_rms(noise, rms=rmsnoise, target_level=target_level)
168 |     # Set the noise level for a given SNR
169 |     noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS)
170 |     noisenewlevel = noise * noisescalar
171 | 
172 |     # Mix noise and clean speech
173 |     noisyspeech = clean + noisenewlevel
174 |     # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value
175 |     # There is a chance of clipping that might happen with very less probability, which is not a major issue. 
176 |     noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper'])
177 |     rmsnoisy = (noisyspeech**2).mean()**0.5
178 |     scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS)
179 |     noisyspeech = noisyspeech * scalarnoisy
180 |     clean = clean * scalarnoisy
181 |     noisenewlevel = noisenewlevel * scalarnoisy
182 |     # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly
183 |     if is_clipped(noisyspeech):
184 |         noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS)
185 |         noisyspeech = noisyspeech/noisyspeech_maxamplevel
186 |         clean = clean/noisyspeech_maxamplevel
187 |         noisenewlevel = noisenewlevel/noisyspeech_maxamplevel
188 |         noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS)))
189 | 
190 |     return clean, noisenewlevel, noisyspeech, noisy_rms_level
191 |     
192 | 
193 | def active_rms(clean, noise, fs=16000, energy_thresh=-50):
194 |     '''Returns the clean and noise RMS of the noise calculated only in the active portions'''
195 |     window_size = 100 # in ms
196 |     window_samples = int(fs*window_size/1000)
197 |     sample_start = 0
198 |     noise_active_segs = []
199 |     clean_active_segs = []
200 | 
201 |     while sample_start < len(noise):
202 |         sample_end = min(sample_start + window_samples, len(noise))
203 |         noise_win = noise[sample_start:sample_end]
204 |         clean_win = clean[sample_start:sample_end]
205 |         noise_seg_rms = (noise_win**2).mean()**0.5
206 |         # Considering frames with energy
207 |         if noise_seg_rms > energy_thresh:
208 |             noise_active_segs = np.append(noise_active_segs, noise_win)
209 |             clean_active_segs = np.append(clean_active_segs, clean_win)
210 |         sample_start += window_samples
211 | 
212 |     if len(noise_active_segs)!=0:
213 |         noise_rms = (noise_active_segs**2).mean()**0.5
214 |     else:
215 |         noise_rms = EPS
216 |         
217 |     if len(clean_active_segs)!=0:
218 |         clean_rms = (clean_active_segs**2).mean()**0.5
219 |     else:
220 |         clean_rms = EPS
221 | 
222 |     return clean_rms, noise_rms
223 | 
224 | 
225 | def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25):
226 |     '''Return the percentage of the time the audio signal is above an energy threshold'''
227 | 
228 |     audio = normalize(audio, target_level)
229 |     window_size = 50 # in ms
230 |     window_samples = int(fs*window_size/1000)
231 |     sample_start = 0
232 |     cnt = 0
233 |     prev_energy_prob = 0
234 |     active_frames = 0
235 | 
236 |     a = -1
237 |     b = 0.2
238 |     alpha_rel = 0.05
239 |     alpha_att = 0.8
240 | 
241 |     while sample_start < len(audio):
242 |         sample_end = min(sample_start + window_samples, len(audio))
243 |         audio_win = audio[sample_start:sample_end]
244 |         frame_rms = 20*np.log10(sum(audio_win**2)+EPS)
245 |         frame_energy_prob = 1./(1+np.exp(-(a+b*frame_rms)))
246 | 
247 |         if frame_energy_prob > prev_energy_prob:
248 |             smoothed_energy_prob = frame_energy_prob*alpha_att + prev_energy_prob*(1-alpha_att)
249 |         else:
250 |             smoothed_energy_prob = frame_energy_prob*alpha_rel + prev_energy_prob*(1-alpha_rel)
251 | 
252 |         if smoothed_energy_prob > energy_thresh:
253 |             active_frames += 1
254 |         prev_energy_prob = frame_energy_prob
255 |         sample_start += window_samples
256 |         cnt += 1
257 | 
258 |     perc_active = active_frames/cnt
259 |     return perc_active
260 | 
261 | 
262 | def resampler(input_dir, target_sr=16000, ext='*.wav'):
263 |     '''Resamples the audio files in input_dir to target_sr'''
264 |     files = glob.glob(f"{input_dir}/"+ext)
265 |     for pathname in files:
266 |         print(pathname)
267 |         try:
268 |             audio, fs = audioread(pathname)
269 |             audio_resampled = librosa.core.resample(audio, fs, target_sr)
270 |             audiowrite(pathname, audio_resampled, target_sr)
271 |         except:
272 |             continue
273 | 
274 | 
275 | def audio_segmenter(input_dir, dest_dir, segment_len=10, ext='*.wav'):
276 |     '''Segments the audio clips in dir to segment_len in secs'''
277 |     files = glob.glob(f"{input_dir}/"+ext)
278 |     for i in range(len(files)):
279 |         audio, fs = audioread(files[i])
280 |         
281 |         if len(audio) > (segment_len*fs) and len(audio)%(segment_len*fs) != 0:
282 |             audio = np.append(audio, audio[0 : segment_len*fs - (len(audio)%(segment_len*fs))]) 
283 |         if len(audio) < (segment_len*fs):
284 |             while len(audio) < (segment_len*fs):
285 |                 audio = np.append(audio, audio)
286 |             audio = audio[:segment_len*fs]
287 |         
288 |         num_segments = int(len(audio)/(segment_len*fs))
289 |         audio_segments = np.split(audio, num_segments)
290 | 
291 |         basefilename = os.path.basename(files[i])
292 |         basename, ext = os.path.splitext(basefilename)
293 | 
294 |         for j in range(len(audio_segments)):
295 |             newname = basename+'_'+str(j)+ext
296 |             destpath = os.path.join(dest_dir,newname)
297 |             audiowrite(destpath, audio_segments[j], fs)
298 | 
299 | 


--------------------------------------------------------------------------------
/docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf


--------------------------------------------------------------------------------
/docs/ICASSP_2021_DNS_challenge.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2021_DNS_challenge.pdf


--------------------------------------------------------------------------------
/docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf


--------------------------------------------------------------------------------
/download-dns-challenge-1.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Datasets for INTERSPEECH 2020 DNS Challenge 1 *****
 4 | 
 5 | # NOTE: This data is for the *PAST* challenge!
 6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
 7 | # has its own download script, `download-dns-challenge-4.sh`
 8 | 
 9 | ###############################################################
10 | 
11 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
12 | 
13 | mkdir -p ./datasets/
14 | 
15 | URL="$AZURE_URL/datasets-interspeech2020.tar.bz2"
16 | echo "Download: $BLOB"
17 | 
18 | # DRY RUN: print HTTP header WITHOUT downloading the files
19 | curl -s -I "$URL"
20 | 
21 | # Actually download the archive - UNCOMMENT it when ready to download
22 | # curl "$URL" -o "$BLOB"
23 | 
24 | # Same as above, but using wget
25 | # wget "$URL" -O "$BLOB"
26 | 
27 | # Same, + unpack files on the fly
28 | # curl "$URL" | tar -f - -x -j
29 | 


--------------------------------------------------------------------------------
/download-dns-challenge-2.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Datasets for ICASSP 2021 DNS Challenge 2 *****
 4 | 
 5 | # NOTE: This data is for the *PAST* challenge!
 6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
 7 | # has its own download script, `download-dns-challenge-4.sh`
 8 | 
 9 | # NOTE: Before downloading, make sure you have enough space
10 | # on your local storage!
11 | 
12 | # In all, you will need at least 230GB to store UNPACKED data.
13 | # Archived, the same data takes 155GB total.
14 | 
15 | # Please comment out the files you don't need before launching
16 | # the script.
17 | 
18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
19 | # Please scroll down and edit this script to pick the
20 | # downloading method that works best for you.
21 | 
22 | # -------------------------------------------------------------
23 | # The directory structure of the unpacked data is:
24 | 
25 | # datasets 229G
26 | # +-- clean 204G
27 | # |   +-- emotional_speech 403M
28 | # |   +-- french_data 21G
29 | # |   +-- german_speech 66G
30 | # |   +-- italian_speech 14G
31 | # |   +-- mandarin_speech 21G
32 | # |   +-- read_speech 61G
33 | # |   +-- russian_speech 5.1G
34 | # |   +-- singing_voice 979M
35 | # |   \-- spanish_speech 17G
36 | # +-- dev_testset 211M
37 | # +-- impulse_responses 4.3G
38 | # |   +-- SLR26 2.1G
39 | # |   \-- SLR28 2.3G
40 | # \-- noise 20G
41 | 
42 | BLOB_NAMES=(
43 | 
44 |     # DEMAND dataset
45 |     DEMAND.tar.bz2
46 | 
47 |     # Wideband clean speech
48 |     datasets/datasets.clean.read_speech.tar.bz2
49 | 
50 |     # Wideband emotional speech
51 |     datasets/datasets.clean.emotional_speech.tar.bz2
52 | 
53 |     # Wideband non-English clean speech
54 |     datasets/datasets.clean.french_data.tar.bz2
55 |     datasets/datasets.clean.german_speech.tar.bz2
56 |     datasets/datasets.clean.italian_speech.tar.bz2
57 |     datasets/datasets.clean.mandarin_speech.tar.bz2
58 |     datasets/datasets.clean.russian_speech.tar.bz2
59 |     datasets/datasets.clean.singing_voice.tar.bz2
60 |     datasets/datasets.clean.spanish_speech.tar.bz2
61 | 
62 |     # Wideband noise, IR, and test data
63 |     datasets/datasets.impulse_responses.tar.bz2
64 |     datasets/datasets.noise.tar.bz2
65 |     datasets/datasets.dev_testset.tar.bz2
66 | )
67 | 
68 | ###############################################################
69 | 
70 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
71 | 
72 | mkdir -p ./datasets
73 | 
74 | for BLOB in ${BLOB_NAMES[@]}
75 | do
76 |     URL="$AZURE_URL/$BLOB"
77 |     echo "Download: $BLOB"
78 | 
79 |     # DRY RUN: print HTTP headers WITHOUT downloading the files
80 |     curl -s -I "$URL" | head -n 1
81 | 
82 |     # Actually download the files - UNCOMMENT it when ready to download
83 |     # curl "$URL" -o "$BLOB"
84 | 
85 |     # Same as above, but using wget
86 |     # wget "$URL" -O "$BLOB"
87 | 
88 |     # Same, + unpack files on the fly
89 |     # curl "$URL" | tar -f - -x -j
90 | done
91 | 


--------------------------------------------------------------------------------
/download-dns-challenge-3.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/bash
  2 | 
  3 | # ***** Datasets for INTERSPEECH 2021 DNS Challenge 3 *****
  4 | 
  5 | # NOTE: This data is for the *PAST* challenge!
  6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
  7 | # has its own download script, `download-dns-challenge-4.sh`
  8 | 
  9 | # NOTE: Before downloading, make sure you have enough space
 10 | # on your local storage!
 11 | 
 12 | # In all, you will need at least 830GB to store UNPACKED data.
 13 | # Archived, the same data takes 512GB total.
 14 | 
 15 | # Please comment out the files you don't need before launching
 16 | # the script.
 17 | 
 18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
 19 | # Please scroll down and edit this script to pick the
 20 | # downloading method that works best for you.
 21 | 
 22 | # -------------------------------------------------------------
 23 | # The directory structure of the unpacked data is:
 24 | 
 25 | # *** Wideband data: ***
 26 | 
 27 | # datasets 229G
 28 | # +-- clean 204G
 29 | # |   +-- emotional_speech 403M
 30 | # |   +-- french_data 21G
 31 | # |   +-- german_speech 66G
 32 | # |   +-- italian_speech 14G
 33 | # |   +-- mandarin_speech 21G
 34 | # |   +-- read_speech 61G
 35 | # |   +-- russian_speech 5.1G
 36 | # |   +-- singing_voice 979M
 37 | # |   \-- spanish_speech 17G
 38 | # +-- dev_testset 211M
 39 | # +-- impulse_responses 4.3G
 40 | # |   +-- SLR26 2.1G
 41 | # |   \-- SLR28 2.3G
 42 | # \-- noise 20G
 43 | 
 44 | # *** Fullband data: ***
 45 | 
 46 | # datasets_fullband 600G
 47 | # +-- clean_fullband 542G
 48 | # |   +-- VocalSet_48kHz_mono 974M
 49 | # |   +-- emotional_speech 1.2G
 50 | # |   +-- french_data 62G
 51 | # |   +-- german_speech 194G
 52 | # |   +-- italian_speech 42G
 53 | # |   +-- read_speech 182G
 54 | # |   +-- russian_speech 12G
 55 | # |   \-- spanish_speech 50G
 56 | # +-- dev_testset_fullband 630M
 57 | # \-- noise_fullband 58G
 58 | 
 59 | BLOB_NAMES=(
 60 | 
 61 |     # DEMAND dataset
 62 |     DEMAND.tar.bz2
 63 | 
 64 |     # Wideband clean speech
 65 |     datasets/datasets.clean.read_speech.tar.bz2
 66 | 
 67 |     # Wideband emotional speech
 68 |     datasets/datasets.clean.emotional_speech.tar.bz2
 69 | 
 70 |     # Wideband non-English clean speech
 71 |     datasets/datasets.clean.french_data.tar.bz2
 72 |     datasets/datasets.clean.german_speech.tar.bz2
 73 |     datasets/datasets.clean.italian_speech.tar.bz2
 74 |     datasets/datasets.clean.mandarin_speech.tar.bz2
 75 |     datasets/datasets.clean.russian_speech.tar.bz2
 76 |     datasets/datasets.clean.singing_voice.tar.bz2
 77 |     datasets/datasets.clean.spanish_speech.tar.bz2
 78 | 
 79 |     # Wideband noise, IR, and test data
 80 |     datasets/datasets.impulse_responses.tar.bz2
 81 |     datasets/datasets.noise.tar.bz2
 82 |     datasets/datasets.dev_testset.tar.bz2
 83 | 
 84 |     # ---------------------------------------------------------
 85 | 
 86 |     # Fullband clean speech
 87 |     datasets_fullband/datasets_fullband.clean_fullband.read_speech.0.tar.bz2
 88 |     datasets_fullband/datasets_fullband.clean_fullband.read_speech.1.tar.bz2
 89 |     datasets_fullband/datasets_fullband.clean_fullband.read_speech.2.tar.bz2
 90 |     datasets_fullband/datasets_fullband.clean_fullband.read_speech.3.tar.bz2
 91 |     datasets_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono.tar.bz2
 92 | 
 93 |     # Fullband emotional speech
 94 |     datasets_fullband/datasets_fullband.clean_fullband.emotional_speech.tar.bz2
 95 | 
 96 |     # Fullband non-English clean speech
 97 |     datasets_fullband/datasets_fullband.clean_fullband.french_data.tar.bz2
 98 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.0.tar.bz2
 99 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.1.tar.bz2
100 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.2.tar.bz2
101 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.3.tar.bz2
102 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.4.tar.bz2
103 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.5.tar.bz2
104 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.6.tar.bz2
105 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.7.tar.bz2
106 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.8.tar.bz2
107 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.9.tar.bz2
108 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.10.tar.bz2
109 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.11.tar.bz2
110 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.12.tar.bz2
111 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.13.tar.bz2
112 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.14.tar.bz2
113 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.15.tar.bz2
114 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.16.tar.bz2
115 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.17.tar.bz2
116 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.18.tar.bz2
117 |     datasets_fullband/datasets_fullband.clean_fullband.german_speech.19.tar.bz2
118 |     datasets_fullband/datasets_fullband.clean_fullband.italian_speech.tar.bz2
119 |     datasets_fullband/datasets_fullband.clean_fullband.russian_speech.tar.bz2
120 |     datasets_fullband/datasets_fullband.clean_fullband.spanish_speech.tar.bz2
121 | 
122 |     # Fullband noise and test data
123 |     datasets_fullband/datasets_fullband.noise_fullband.tar.bz2
124 |     datasets_fullband/datasets_fullband.dev_testset_fullband.tar.bz2
125 | )
126 | 
127 | ###############################################################
128 | 
129 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
130 | 
131 | mkdir -p ./datasets ./datasets_fullband
132 | 
133 | for BLOB in ${BLOB_NAMES[@]}
134 | do
135 |     URL="$AZURE_URL/$BLOB"
136 |     echo "Download: $BLOB"
137 | 
138 |     # DRY RUN: print HTTP headers WITHOUT downloading the files
139 |     curl -s -I "$URL" | head -n 1
140 | 
141 |     # Actually download the files - UNCOMMENT it when ready to download
142 |     # curl "$URL" -o "$BLOB"
143 | 
144 |     # Same as above, but using wget
145 |     # wget "$URL" -O "$BLOB"
146 | 
147 |     # Same, + unpack files on the fly
148 |     # curl "$URL" | tar -f - -x -j
149 | done
150 | 


--------------------------------------------------------------------------------
/download-dns-challenge-4-pdns.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/bash
  2 | 
  3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Personalized DNS Track *****
  4 | 
  5 | # NOTE: Before downloading, make sure you have enough space
  6 | # on your local storage!
  7 | 
  8 | # In all, you will need about 380TB to store the UNPACKED data.
  9 | # Archived, the same data takes about 200GB total.
 10 | 
 11 | # Please comment out the files you don't need before launching
 12 | # the script.
 13 | 
 14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
 15 | # Please scroll down and edit this script to pick the
 16 | # downloading method that works best for you.
 17 | 
 18 | # -------------------------------------------------------------
 19 | # The directory structure of the unpacked data is:
 20 | 
 21 | # . 362G
 22 | # +-- datasets_fullband 64G
 23 | # |   +-- impulse_responses 5.9G
 24 | # |   \-- noise_fullband 58G
 25 | # +-- pdns_training_set 294G
 26 | # |   +-- enrollment_embeddings 115M
 27 | # |   +-- enrollment_wav 42G
 28 | # |   +-- raw/clean 252G
 29 | # |       +-- english 168G
 30 | # |       +-- french 2.1G
 31 | # |       +-- german 53G
 32 | # |       +-- italian 17G
 33 | # |       +-- russian 6.8G
 34 | # |       \-- spanish 5.4G
 35 | # \-- personalized_dev_testset 3.3G
 36 | 
 37 | BLOB_NAMES=(
 38 | 
 39 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_000.tar.bz2
 40 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_001.tar.bz2
 41 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_002.tar.bz2
 42 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_003.tar.bz2
 43 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_004.tar.bz2
 44 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_005.tar.bz2
 45 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_006.tar.bz2
 46 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_007.tar.bz2
 47 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_008.tar.bz2
 48 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_009.tar.bz2
 49 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_010.tar.bz2
 50 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_011.tar.bz2
 51 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_012.tar.bz2
 52 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_013.tar.bz2
 53 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_014.tar.bz2
 54 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_015.tar.bz2
 55 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_016.tar.bz2
 56 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_017.tar.bz2
 57 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_018.tar.bz2
 58 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_019.tar.bz2
 59 |     pdns_training_set/raw/pdns_training_set.raw.clean.english_020.tar.bz2
 60 |     pdns_training_set/raw/pdns_training_set.raw.clean.french_000.tar.bz2
 61 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_000.tar.bz2
 62 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_001.tar.bz2
 63 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_002.tar.bz2
 64 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_003.tar.bz2
 65 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_004.tar.bz2
 66 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_005.tar.bz2
 67 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_006.tar.bz2
 68 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_007.tar.bz2
 69 |     pdns_training_set/raw/pdns_training_set.raw.clean.german_008.tar.bz2
 70 |     pdns_training_set/raw/pdns_training_set.raw.clean.italian_000.tar.bz2
 71 |     pdns_training_set/raw/pdns_training_set.raw.clean.italian_001.tar.bz2
 72 |     pdns_training_set/raw/pdns_training_set.raw.clean.italian_002.tar.bz2
 73 |     pdns_training_set/raw/pdns_training_set.raw.clean.russian_000.tar.bz2
 74 |     pdns_training_set/raw/pdns_training_set.raw.clean.spanish_000.tar.bz2
 75 |     pdns_training_set/raw/pdns_training_set.raw.clean.spanish_001.tar.bz2
 76 |     pdns_training_set/raw/pdns_training_set.raw.clean.spanish_002.tar.bz2
 77 | 
 78 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_000.tar.bz2
 79 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_001.tar.bz2
 80 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_002.tar.bz2
 81 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_003.tar.bz2
 82 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_004.tar.bz2
 83 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.french_000.tar.bz2
 84 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_000.tar.bz2
 85 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_001.tar.bz2
 86 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.italian_000.tar.bz2
 87 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.russian_000.tar.bz2
 88 |     pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.spanish_000.tar.bz2
 89 | 
 90 |     pdns_training_set/pdns_training_set.enrollment_embeddings_000.tar.bz2
 91 | 
 92 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
 93 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
 94 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
 95 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
 96 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
 97 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
 98 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
 99 | 
100 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
101 |     datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
102 | 
103 |     datasets_fullband/datasets_fullband.impulse_responses_000.tar.bz2
104 | 
105 |     personalized_dev_testset/personalized_dev_testset.enrollment.tar.bz2
106 |     personalized_dev_testset/personalized_dev_testset.noisy_testclips.tar.bz2
107 | )
108 | 
109 | ###############################################################
110 | 
111 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive"
112 | 
113 | OUTPUT_PATH="."
114 | 
115 | mkdir -p $OUTPUT_PATH/{pdns_training_set/{raw,enrollment_wav},datasets_fullband/noise_fullband}
116 | 
117 | for BLOB in ${BLOB_NAMES[@]}
118 | do
119 |     URL="$AZURE_URL/$BLOB"
120 |     echo "Download: $BLOB"
121 | 
122 |     # DRY RUN: print HTTP response and Content-Length
123 |     # WITHOUT downloading the files
124 |     curl -s -I "$URL" | head -n 2
125 | 
126 |     # Actually download the files: UNCOMMENT when ready to download
127 |     # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
128 | 
129 |     # Same as above, but using wget
130 |     # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
131 | 
132 |     # Same, + unpack files on the fly
133 |     # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
134 | done
135 | 


--------------------------------------------------------------------------------
/download-dns-challenge-4.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/bash
  2 | 
  3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Main (Real-Time) Track *****
  4 | 
  5 | # NOTE: Before downloading, make sure you have enough space
  6 | # on your local storage!
  7 | 
  8 | # In all, you will need about 1TB to store the UNPACKED data.
  9 | # Archived, the same data takes about 550GB total.
 10 | 
 11 | # Please comment out the files you don't need before launching
 12 | # the script.
 13 | 
 14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
 15 | # Please scroll down and edit this script to pick the
 16 | # downloading method that works best for you.
 17 | 
 18 | # -------------------------------------------------------------
 19 | # The directory structure of the unpacked data is:
 20 | 
 21 | # datasets_fullband 892G
 22 | # +-- dev_testset 1.7G
 23 | # +-- impulse_responses 5.9G
 24 | # +-- noise_fullband 58G
 25 | # \-- clean_fullband 827G
 26 | #     +-- emotional_speech 2.4G
 27 | #     +-- french_speech 62G
 28 | #     +-- german_speech 319G
 29 | #     +-- italian_speech 42G
 30 | #     +-- read_speech 299G
 31 | #     +-- russian_speech 12G
 32 | #     +-- spanish_speech 65G
 33 | #     +-- vctk_wav48_silence_trimmed 27G
 34 | #     \-- VocalSet_48kHz_mono 974M
 35 | 
 36 | BLOB_NAMES=(
 37 | 
 38 |     clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2
 39 | 
 40 |     clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2
 41 | 
 42 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2
 43 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2
 44 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2
 45 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2
 46 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2
 47 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2
 48 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2
 49 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2
 50 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2
 51 | 
 52 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2
 53 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2
 54 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2
 55 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2
 56 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2
 57 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2
 58 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2
 59 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2
 60 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2
 61 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2
 62 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2
 63 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2
 64 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2
 65 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2
 66 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2
 67 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2
 68 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2
 69 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2
 70 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2
 71 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2
 72 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2
 73 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2
 74 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2
 75 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2
 76 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2
 77 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2
 78 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2
 79 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2
 80 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2
 81 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2
 82 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2
 83 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2
 84 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2
 85 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2
 86 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2
 87 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2
 88 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2
 89 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2
 90 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2
 91 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2
 92 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2
 93 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2
 94 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2
 95 | 
 96 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2
 97 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2
 98 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2
 99 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2
100 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2
101 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2
102 | 
103 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2
104 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2
105 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2
106 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2
107 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2
108 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2
109 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2
110 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2
111 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2
112 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2
113 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2
114 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2
115 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2
116 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2
117 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2
118 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2
119 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2
120 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2
121 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2
122 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2
123 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2
124 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2
125 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2
126 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2
127 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2
128 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2
129 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2
130 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2
131 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2
132 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2
133 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2
134 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2
135 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2
136 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2
137 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2
138 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2
139 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2
140 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2
141 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2
142 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2
143 | 
144 |     clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2
145 |     clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2
146 | 
147 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2
148 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2
149 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2
150 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2
151 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2
152 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2
153 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2
154 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2
155 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2
156 | 
157 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2
158 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2
159 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2
160 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2
161 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2
162 | 
163 |     noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
164 |     noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
165 |     noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
166 |     noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
167 |     noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
168 |     noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
169 |     noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
170 | 
171 |     noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
172 |     noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
173 | 
174 |     datasets_fullband.dev_testset_000.tar.bz2
175 | 
176 |     datasets_fullband.impulse_responses_000.tar.bz2
177 | )
178 | 
179 | ###############################################################
180 | 
181 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband"
182 | 
183 | OUTPUT_PATH="./datasets_fullband"
184 | 
185 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband}
186 | 
187 | for BLOB in ${BLOB_NAMES[@]}
188 | do
189 |     URL="$AZURE_URL/$BLOB"
190 |     echo "Download: $BLOB"
191 | 
192 |     # DRY RUN: print HTTP response and Content-Length
193 |     # WITHOUT downloading the files
194 |     curl -s -I "$URL" | head -n 2
195 | 
196 |     # Actually download the files: UNCOMMENT when ready to download
197 |     # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
198 | 
199 |     # Same as above, but using wget
200 |     # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
201 | 
202 |     # Same, + unpack files on the fly
203 |     # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
204 | done
205 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-baseline.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # Zip file is 1.4 GB. 
 9 | # -------------------------------------------------------------
10 | 
11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip"
12 | echo "Download: $URL"
13 | #
14 | # DRY RUN: print HTTP header WITHOUT downloading the files
15 | curl -s -I "$URL"
16 | #
17 | # Actually download the archive - UNCOMMENT it when ready to download
18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip --output 'Baseline.zip'
19 | #wget --no-check-certificate "$URL"
20 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-filelists-headset.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # Zip file is 1.5MB. 
 9 | # It contains speaker ID filsists for headset training clean speech (Track 1)
10 | # -------------------------------------------------------------
11 | 
12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip"
13 | echo "Download: $URL"
14 | #
15 | # DRY RUN: print HTTP header WITHOUT downloading the files
16 | curl -s -I "$URL"
17 | #
18 | # Actually download the archive - UNCOMMENT it when ready to download
19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip --output 'filelists_headset.zip'
20 | #wget --no-check-certificate "$URL"
21 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-filelists-speakerphone.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # Zip file is 1.5MB. 
 9 | # It contains speaker ID filsists for speakerphone training clean speech (Track 2)
10 | # -------------------------------------------------------------
11 | 
12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip"
13 | echo "Download: $URL"
14 | #
15 | # DRY RUN: print HTTP header WITHOUT downloading the files
16 | curl -s -I "$URL"
17 | #
18 | # Actually download the archive - UNCOMMENT it when ready to download
19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip --output 'filelists_speakerphone.zip'
20 | #wget --no-check-certificate "$URL"
21 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-headset-training.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/bash
  2 | 
  3 | # ***** 5th DNS Challenge at ICASSP 2023*****
  4 | # Track 1 Headset Clean speech: All Languages 
  5 | # -------------------------------------------------------------
  6 | # In all, you will need about 1TB to store the UNPACKED data.
  7 | # Archived, the same data takes about 550GB total.
  8 | 
  9 | # Please comment out the files you don't need before launching
 10 | # the script.
 11 | 
 12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
 13 | # Please scroll down and edit this script to pick the
 14 | # downloading method that works best for you.
 15 | 
 16 | # -------------------------------------------------------------
 17 | # The directory structure of the unpacked data is:
 18 | 
 19 | # datasets_fullband 
 20 | # \-- clean_fullband 827G
 21 | #     +-- emotional_speech 2.4G
 22 | #     +-- french_speech 62G
 23 | #     +-- german_speech 319G
 24 | #     +-- italian_speech 42G
 25 | #     +-- read_speech 299G
 26 | #     +-- russian_speech 12G
 27 | #     +-- spanish_speech 65G
 28 | #     +-- vctk_wav48_silence_trimmed 27G
 29 | #     \-- VocalSet_48kHz_mono 974M
 30 | 
 31 | BLOB_NAMES=(
 32 | 
 33 |     Track1_Headset/VocalSet_48kHz_mono.tgz
 34 |     Track1_Headset/emotional_speech.tgz
 35 | 
 36 |     Track1_Headset/french_speech.tar.gz.partaa
 37 |     Track1_Headset/french_speech.tar.gz.partab
 38 |     Track1_Headset/french_speech.tar.gz.partac
 39 |     Track1_Headset/french_speech.tar.gz.partad
 40 |     Track1_Headset/french_speech.tar.gz.partae
 41 |     Track1_Headset/french_speech.tar.gz.partah
 42 | 
 43 |     Track1_Headset/german_speech.tgz.partaa
 44 |     Track1_Headset/german_speech.tgz.partab
 45 |     Track1_Headset/german_speech.tgz.partac
 46 |     Track1_Headset/german_speech.tgz.partad
 47 |     Track1_Headset/german_speech.tgz.partae
 48 |     Track1_Headset/german_speech.tgz.partaf
 49 |     Track1_Headset/german_speech.tgz.partag
 50 |     Track1_Headset/german_speech.tgz.partah
 51 |     Track1_Headset/german_speech.tgz.partaj
 52 |     Track1_Headset/german_speech.tgz.partal
 53 |     Track1_Headset/german_speech.tgz.partam
 54 |     Track1_Headset/german_speech.tgz.partan
 55 |     Track1_Headset/german_speech.tgz.partao
 56 |     Track1_Headset/german_speech.tgz.partap
 57 |     Track1_Headset/german_speech.tgz.partaq
 58 |     Track1_Headset/german_speech.tgz.partar
 59 |     Track1_Headset/german_speech.tgz.partas
 60 |     Track1_Headset/german_speech.tgz.partat
 61 |     Track1_Headset/german_speech.tgz.partau
 62 |     Track1_Headset/german_speech.tgz.partav
 63 |     Track1_Headset/german_speech.tgz.partaw
 64 | 
 65 |     Track1_Headset/italian_speech.tgz.partaa
 66 |     Track1_Headset/italian_speech.tgz.partab
 67 |     Track1_Headset/italian_speech.tgz.partac
 68 |     Track1_Headset/italian_speech.tgz.partad
 69 |     
 70 |     Track1_Headset/read_speech.tgz.partaa
 71 |     Track1_Headset/read_speech.tgz.partab
 72 |     Track1_Headset/read_speech.tgz.partac
 73 |     Track1_Headset/read_speech.tgz.partad
 74 |     Track1_Headset/read_speech.tgz.partae
 75 |     Track1_Headset/read_speech.tgz.partaf
 76 |     Track1_Headset/read_speech.tgz.partag
 77 |     Track1_Headset/read_speech.tgz.partah
 78 |     Track1_Headset/read_speech.tgz.partai
 79 |     Track1_Headset/read_speech.tgz.partaj
 80 |     Track1_Headset/read_speech.tgz.partak
 81 |     Track1_Headset/read_speech.tgz.partal
 82 |     Track1_Headset/read_speech.tgz.partam
 83 |     Track1_Headset/read_speech.tgz.partan
 84 |     Track1_Headset/read_speech.tgz.partao
 85 |     Track1_Headset/read_speech.tgz.partap
 86 |     Track1_Headset/read_speech.tgz.partaq
 87 |     Track1_Headset/read_speech.tgz.partar
 88 |     Track1_Headset/read_speech.tgz.partas
 89 |     Track1_Headset/read_speech.tgz.partat
 90 |     Track1_Headset/read_speech.tgz.partau
 91 | 
 92 |     Track1_Headset/russian_speech.tgz
 93 | 
 94 |     Track1_Headset/spanish_speech.tgz.partaa
 95 |     Track1_Headset/spanish_speech.tgz.partab
 96 |     Track1_Headset/spanish_speech.tgz.partac
 97 |     Track1_Headset/spanish_speech.tgz.partad
 98 |     Track1_Headset/spanish_speech.tgz.partae
 99 |     Track1_Headset/spanish_speech.tgz.partaf
100 |     Track1_Headset/spanish_speech.tgz.partag
101 | 
102 |     Track1_Headset/vctk_wav48_silence_trimmed.tgz.partaa
103 |     Track1_Headset/vctk_wav48_silence_trimmed.tgz.partab
104 |     Track1_Headset/vctk_wav48_silence_trimmed.tgz.partac
105 | )
106 | 
107 | ###############################################################
108 | # this data is extracted from datasets used in Track 2.
109 | 
110 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset"
111 | 
112 | OUTPUT_PATH="./datasets_fullband"
113 | 
114 | mkdir -p $OUTPUT_PATH/{clean_fullband}
115 | 
116 | for BLOB in ${BLOB_NAMES[@]}
117 | do
118 |     URL="$AZURE_URL/$BLOB"
119 |     echo "Download: $BLOB"
120 | 
121 |     # DRY RUN: print HTTP response and Content-Length
122 |     # WITHOUT downloading the files
123 |     curl -s -I "$URL" | head -n 2
124 | 
125 |     # Actually download the files: UNCOMMENT when ready to download
126 |     # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
127 | 
128 |     # Same as above, but using wget
129 |     # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
130 | 
131 |     # Same, + unpack files on the fly
132 |     # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
133 | done
134 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-noise-ir.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** 5th DNS Challenge at ICASSP 2023*****
 4 | # Noise data which is used in both tracks
 5 | # Also download the impulse response data
 6 | 
 7 | # All compressed noises files are ~39 GB
 8 | # -------------------------------------------------------------
 9 | # -------------------------------------------------------------
10 | # The directory structure of the unpacked data is:
11 | # +-- noise_fullband 
12 | 
13 | BLOB_NAMES=(
14 |     noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
15 |     noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
16 |     noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
17 |     noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
18 |     noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
19 |     noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
20 |     noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
21 | 
22 |     noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
23 |     noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
24 | 
25 |     datasets_fullband.impulse_responses_000.tar.bz2
26 | )
27 | 
28 | ###############################################################
29 | 
30 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset"
31 | 
32 | OUTPUT_PATH="./"
33 | 
34 | mkdir -p $OUTPUT_PATH/{noise_fullband}
35 | 
36 | for BLOB in ${BLOB_NAMES[@]}
37 | do
38 |     URL="$AZURE_URL/$BLOB"
39 |     echo "Download: $BLOB"
40 | 
41 |     # DRY RUN: print HTTP response and Content-Length
42 |     # WITHOUT downloading the files
43 |     curl -s -I "$URL" | head -n 2
44 | 
45 |     # Actually download the files: UNCOMMENT when ready to download
46 |     # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
47 | 
48 |     # Same as above, but using wget
49 |     # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
50 | 
51 |     # Same, + unpack files on the fly
52 |     # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
53 | done
54 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-paralinguistic-train.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # Zip file is 181.8 MB. 
 9 | # -------------------------------------------------------------
10 | 
11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip"
12 | echo "Download: $URL"
13 | #
14 | # DRY RUN: print HTTP header WITHOUT downloading the files
15 | curl -s -I "$URL"
16 | #
17 | # Actually download the archive - UNCOMMENT it when ready to download
18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip --output 'paralinguistic_training.zip'
19 | #wget --no-check-certificate "$URL"
20 | 


--------------------------------------------------------------------------------
/download-dns-challenge-5-speakerphone-training.sh:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/bash
  2 | 
  3 | # ***** 5th DNS Challenge at ICASSP 2023*****
  4 | # Track 2 Speakerphone Clean speech: All Languages
  5 | # -------------------------------------------------------------
  6 | # In all, you will need about 1TB to store the UNPACKED data.
  7 | # Archived, the same data takes about 550GB total.
  8 | 
  9 | # Please comment out the files you don't need before launching
 10 | # the script.
 11 | 
 12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
 13 | # Please scroll down and edit this script to pick the
 14 | # downloading method that works best for you.
 15 | 
 16 | # -------------------------------------------------------------
 17 | # The directory structure of the unpacked data is:
 18 | 
 19 | # datasets_fullband 
 20 | # \-- clean_fullband 827G
 21 | #     +-- emotional_speech 2.4G
 22 | #     +-- french_speech 62G
 23 | #     +-- german_speech 319G
 24 | #     +-- italian_speech 42G
 25 | #     +-- read_speech 299G
 26 | #     +-- russian_speech 12G
 27 | #     +-- spanish_speech 65G
 28 | #     +-- vctk_wav48_silence_trimmed 27G
 29 | #     \-- VocalSet_48kHz_mono 974M
 30 | 
 31 | BLOB_NAMES=(
 32 | 
 33 |     clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2
 34 | 
 35 |     clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2
 36 | 
 37 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2
 38 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2
 39 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2
 40 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2
 41 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2
 42 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2
 43 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2
 44 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2
 45 |     clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2
 46 | 
 47 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2
 48 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2
 49 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2
 50 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2
 51 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2
 52 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2
 53 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2
 54 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2
 55 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2
 56 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2
 57 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2
 58 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2
 59 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2
 60 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2
 61 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2
 62 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2
 63 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2
 64 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2
 65 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2
 66 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2
 67 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2
 68 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2
 69 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2
 70 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2
 71 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2
 72 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2
 73 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2
 74 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2
 75 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2
 76 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2
 77 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2
 78 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2
 79 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2
 80 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2
 81 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2
 82 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2
 83 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2
 84 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2
 85 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2
 86 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2
 87 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2
 88 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2
 89 |     clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2
 90 | 
 91 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2
 92 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2
 93 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2
 94 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2
 95 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2
 96 |     clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2
 97 | 
 98 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2
 99 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2
100 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2
101 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2
102 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2
103 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2
104 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2
105 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2
106 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2
107 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2
108 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2
109 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2
110 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2
111 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2
112 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2
113 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2
114 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2
115 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2
116 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2
117 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2
118 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2
119 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2
120 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2
121 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2
122 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2
123 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2
124 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2
125 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2
126 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2
127 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2
128 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2
129 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2
130 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2
131 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2
132 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2
133 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2
134 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2
135 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2
136 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2
137 |     clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2
138 | 
139 |     clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2
140 |     clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2
141 | 
142 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2
143 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2
144 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2
145 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2
146 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2
147 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2
148 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2
149 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2
150 |     clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2
151 | 
152 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2
153 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2
154 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2
155 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2
156 |     clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2
157 | 
158 | )
159 | 
160 | ###############################################################
161 | # this data is identical to non-personalized track 4th DNS Challenge clean speech
162 | # recommend to re-download the data using this script
163 | 
164 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband"
165 | 
166 | OUTPUT_PATH="./datasets_fullband"
167 | 
168 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband}
169 | 
170 | for BLOB in ${BLOB_NAMES[@]}
171 | do
172 |     URL="$AZURE_URL/$BLOB"
173 |     echo "Download: $BLOB"
174 | 
175 |     # DRY RUN: print HTTP response and Content-Length
176 |     # WITHOUT downloading the files
177 |     curl -s -I "$URL" | head -n 2
178 | 
179 |     # Actually download the files: UNCOMMENT when ready to download
180 |     # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
181 | 
182 |     # Same as above, but using wget
183 |     # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
184 | 
185 |     # Same, + unpack files on the fly
186 |     # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
187 | done
188 | 


--------------------------------------------------------------------------------
/download-dns5-blind-testset.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** BLIND Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # -------------------------------------------------------------
 9 | # The directory structure of the unpacked data is:
10 | 
11 | #
12 | # +-- V5_BlindTestSet
13 | # |   +-- Track1_Headset ---> (enrol, noisy)
14 | # |   +-- Track2_Speakerphone ---> (enrol, noisy)
15 | 
16 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_BlindTestSet.zip"
17 | 
18 | echo "Download: $URL"
19 | #
20 | # DRY RUN: print HTTP header WITHOUT downloading the files
21 | curl -s -I "$URL"
22 | #
23 | # Actually download the archive - UNCOMMENT it when ready to download
24 | #do
25 | wget "$URL"
26 | 
27 | #done
28 | # curl "$URL" -o "$BLOB"
29 | # Same as above, but using wget
30 | #wget "$URL 
31 | # Same, + unpack files on the fly
32 | # curl "$URL" | tar -f - -x -j
33 | 


--------------------------------------------------------------------------------
/download-dns5-dev-testset.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # Zip file is 2.9 GB. Unzipped data is 4GB.
 9 | 
10 | # -------------------------------------------------------------
11 | # The directory structure of the unpacked data is:
12 | 
13 | #
14 | # +-- V5_dev_testset 64G
15 | # |   +-- Track1_Headset ---> (enrol, noisy)
16 | # |   +-- Track2_Speakerphone ---> (enrol, noisy)
17 | 
18 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_dev_testset.zip"
19 | echo "Download: $URL"
20 | #
21 | # DRY RUN: print HTTP header WITHOUT downloading the files
22 | curl -s -I "$URL"
23 | #
24 | # Actually download the archive - UNCOMMENT it when ready to download
25 | #do
26 | wget "$URL"
27 | 
28 | #done
29 | # curl "$URL" -o "$BLOB"
30 | # Same as above, but using wget
31 | #wget "$URL 
32 | # Same, + unpack files on the fly
33 | # curl "$URL" | tar -f - -x -j
34 | 


--------------------------------------------------------------------------------
/download_dns_v2_v3_blindset.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/bash
 2 | 
 3 | # ***** BLIND Testset for 2nd and 3rd DNS Challenges combined with additional handpicked clips*****
 4 | 
 5 | # NOTE: Before downloading, make sure you have enough space
 6 | # on your local storage!
 7 | 
 8 | # -------------------------------------------------------------
 9 | # The directory structure of the unpacked data is:
10 | 
11 | #
12 | # +-- V2_V3_Challenge_Combined_Blindset
13 | # |   +-- handpicked_emotion_testclips_16k_600_withSNR ---> (600 emotional clips)
14 | # |   +-- mouseclicks_testclips_withSNR_16k            ---> (100 mouseclicks clips)
15 | # |   +-- noisy_blind_testset_v2_challenge_withSNR_16k ---> (700 blindset clips from V2 challenge)
16 | # |   +-- noisy_blind_testset_v3_challenge_withSNR_16k ---> (600 blindset clips from V3 challenge)
17 | 
18 | URL="https://dnschallengepublic.blob.core.windows.net/dns3archive/V2_V3_Challenge_Combined_Blindset.zip"
19 | 
20 | echo "Download: $URL"
21 | #
22 | # DRY RUN: print HTTP header WITHOUT downloading the files
23 | curl -s -I "$URL"
24 | #
25 | # Actually download the archive - UNCOMMENT it when ready to download
26 | #do
27 | wget "$URL"
28 | 
29 | #done
30 | # curl "$URL" -o "$BLOB"
31 | # Same as above, but using wget
32 | #wget "$URL 
33 | # Same, + unpack files on the fly
34 | # curl "$URL" | tar -f - -x -j
35 | 


--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
  1 | <html>
  2 | <head>
  3 |   <title>Model comparison</title>
  4 |   <script src="https://code.jquery.com/jquery-3.4.1.js"></script>
  5 |   <meta name="viewport" content="width=device-width, initial-scale=1">
  6 |   <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
  7 |   <script src="filenamesAndModels.js"></script>
  8 | </head>
  9 | <style>
 10 | .navbar {
 11 |   overflow: hidden;
 12 |   background-color: #333;
 13 |   font-family: Arial, Helvetica, sans-serif;
 14 | }
 15 | 
 16 | .navbar a {
 17 |   float: left;
 18 |   font-size: 16px;
 19 |   color: white;
 20 |   text-align: center;
 21 |   padding: 14px 16px;
 22 |   text-decoration: none;
 23 | }
 24 | 
 25 | .dropdown {
 26 |   float: center;
 27 |   overflow: hidden;
 28 | }
 29 | 
 30 | .dropdown .dropbtn {
 31 |   cursor: pointer;
 32 |   font-size: 16px;  
 33 |   border: none;
 34 |   outline: none;
 35 |   color: white;
 36 |   padding: 14px 16px;
 37 |   background-color: inherit;
 38 |   font-family: inherit;
 39 |   margin: 0;
 40 | }
 41 | 
 42 | .navbar a:hover, .dropdown:hover .dropbtn, .dropbtn:focus {
 43 |   background-color: red;
 44 | }
 45 | 
 46 | .dropdown-content {
 47 |   display: none;
 48 |   position: absolute;
 49 |   background-color: #f9f9f9;
 50 |   min-width: 160px;
 51 |   box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
 52 |   z-index: 1;
 53 | }
 54 | 
 55 | .dropdown-content a {
 56 |   float: none;
 57 |   color: black;
 58 |   padding: 12px 16px;
 59 |   text-decoration: none;
 60 |   display: block;
 61 |   text-align: left;
 62 | }
 63 | 
 64 | .dropdown-content a:hover {
 65 |   background-color: #ddd;
 66 | }
 67 | 
 68 | .show {
 69 |   display: block;
 70 | }
 71 | </style>
 72 | <script>
 73 | 
 74 | 
 75 | 
 76 | var currentCount = 0;
 77 | var currentFile = fileNames[currentCount];
 78 | 
 79 | console.log(currentCount);
 80 | console.log(currentFile);
 81 | 
 82 | var modelsCount = baseUrls.length;
 83 | var filesCount = fileNames.length;
 84 | 
 85 | var modifiedFiles = []
 86 | 
 87 | </script>
 88 | <div class="navbar">
 89 |   <button class="btn btn-primary btn-lg" onclick="loadMsRecordings()">MS Recordings</button>
 90 |   <button class="btn btn-primary btn-lg" onclick="loadAudiosetRecordings()">Audioset Recordings</button>
 91 |   <button class="btn btn-primary btn-lg" onclick="loadReverbRecordings()">Synthetic Reverb Recordings</button>
 92 |   <button class="btn btn-primary btn-lg" onclick="loadNoReverbRecordings()">Synthetic NoReverb Recordings</button>
 93 |   Enter the noise type to filter on<input type="text" name="noiseType"></input><button class="btn btn-info btn-lb" onclick="searchNoiseType()">Search noise type</button>
 94 | </div>
 95 | <div class="container">
 96 |   <h2>Audio Clips</h2>
 97 | 	
 98 | 	<table class="table" id="table2">
 99 | 	  <tbody>
100 | 		<tr><td>Index</td><td id="index"></td></tr>
101 | 		<tr><td>Progress</td><td><div class="progress"><div class="progress-bar" role="progressbar" style="width: 25%;" aria-valuenow="25" aria-valuemin="0" aria-valuemax="100">25%</div></div></td></tr>
102 | 		<tr><td>Clipname</td><td id="clipname"></td></tr>
103 | 		</tbody>
104 | 	</table>
105 | 	
106 | 	 <div class="row">
107 | 		  <button class="btn btn-success btn-lg" onclick="previous()" style="margin: 10px"> Previous</button>
108 | 		  <button class="btn btn-primary btn-lg" onclick="next()" style="margin: 10px"> Next</button>
109 | 		  <button class="btn btn-primary btn-lg" onclick="skip10()" style="margin: 10px"> Skip 10</button>
110 | 		  <button class="btn btn-primary btn-lg" onclick="skip100()" style="margin: 10px"> Skip 100</button>
111 |   </div>
112 | 
113 | </div>
114 | 
115 | <script>
116 | 
117 | // setup
118 | 
119 | function setupIndexAndClip(){
120 | 	let current = ((currentCount+1)*100/filesCount)+"%";
121 | 	$("#index").html((currentCount+1)+" / "+filesCount);
122 | 
123 | 	if(modifiedFiles.length > 0) {
124 | 		current = ((currentCount+1)*100/modifiedFiles.length)+"%";
125 | 		$("#index").html((currentCount+1)+" / "+modifiedFiles.length);
126 | 	} 
127 | 	$(".progress-bar").css("width", current);
128 | 	$(".progress-bar").html(current);
129 | 	$("#clipname").html(currentFile);
130 | }
131 | 	
132 | function setupSrcs(){
133 | 	setupIndexAndClip();
134 | 	
135 | 	for(let i=0; i<modelsCount; i++)
136 |              $("#clip"+i).attr("src", baseUrls[i]+currentFile);
137 | }
138 | 
139 | function changeFileSet(prefix) {
140 |         modifiedFiles = [];
141 | 
142 | 	for(let i=0; i<filesCount; i++) {
143 |                 if(fileNames[i].startsWith(prefix)) {
144 | 			modifiedFiles.push(fileNames[i]);
145 | 		}
146 | 	}
147 |         currentCount = 0;
148 | 	currentFile = modifiedFiles[currentCount];
149 | 
150 | 	setupSrcs();
151 | }
152 | 
153 | function loadMsRecordings() {
154 | 	changeFileSet("ms_");
155 | }
156 | 
157 | function loadAudiosetRecordings() {
158 | 	changeFileSet("audioset_");
159 | }
160 | 
161 | function loadReverbRecordings() {
162 | 	changeFileSet("reverb_");
163 | }
164 | 
165 | function loadNoReverbRecordings() {
166 | 	changeFileSet("noreverb_");
167 | }
168 | 
169 | function searchNoiseType() {
170 | 	modifiedFiles = [];
171 | 
172 | 	for(let i=0; i<filesCount; i++) {
173 | 		console.log(document.getElementsByName('noiseType')[0].value);
174 |                 if(fileNames[i].includes(document.getElementsByName('noiseType')[0].value)) {
175 | 			modifiedFiles.push(fileNames[i]);
176 | 		}
177 | 	}
178 | 
179 | 	currentCount = 0;
180 | 	if(modifiedFiles.length > 0) {
181 | 		currentFile = modifiedFiles[currentCount];
182 | 	} else {
183 | 		currentFile = fileNames[currentCount];
184 | 	}
185 | 
186 | 	setupSrcs();
187 | }
188 | 
189 | function moveNextOrPrev(valueToAdd) {
190 | 	if(modifiedFiles.length == 0) {
191 | 		if(currentCount == (filesCount - valueToAdd))
192 | 			alert("This is the last Clip. Hit 'Previous' to load the previous clip, or you may close the browser. ");
193 | 		else{
194 | 			currentCount = currentCount + valueToAdd;
195 | 			currentFile = fileNames[currentCount];
196 | 			setupSrcs();
197 | 		}
198 | 	} else {
199 | 		if(currentCount == (modifiedFiles.length - valueToAdd))
200 | 			alert("This is the last Clip. Hit 'Previous' to load the previous clip, or you may close the browser. ");
201 | 		else{
202 | 			currentCount = currentCount + valueToAdd;
203 | 			currentFile = modifiedFiles[currentCount];
204 | 			setupSrcs();
205 | 		}
206 | 	}
207 | }
208 | 
209 | // set the scr to the next values on clicking next
210 | function next(){
211 | 	moveNextOrPrev(1);		
212 | }
213 | 
214 | function skip10(){
215 | 	moveNextOrPrev(10);	
216 | }
217 | 
218 | function skip100(){
219 | 	moveNextOrPrev(100);
220 | }
221 | 
222 | function previous(){
223 | 
224 | 	if(currentCount == 0)
225 | 		alert("This is the very first Clip. Hit 'Next' to load the next clip. ");
226 | 	else{
227 | 		currentCount--;
228 | 		currentFile = fileNames[currentCount];
229 | 		if(modifiedFiles.length > 0)
230 | 			currentFile = modifiedFiles[currentCount];
231 | 		setupSrcs();
232 | 	}	
233 | }
234 | 
235 | setupIndexAndClip();
236 | 
237 | var tbody = $("#table2>tbody");
238 | for(let i=0; i<modelsCount; i++)
239 | 	tbody.append("<tr><td>"+modelsUsed[i]+"</td><td><audio controls id=clip"+i+" src='"+baseUrls[0]+currentFile+"' type='audio/wav'></audio></td></tr>");
240 | 
241 | </script>
242 | 	
243 | </html>
244 | 


--------------------------------------------------------------------------------
/noisyspeech_synthesizer.cfg:
--------------------------------------------------------------------------------
 1 | # Configuration for generating Noisy Speech Dataset
 2 | 
 3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz
 4 | # - audioformat: default is .wav
 5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances. 
 6 | # - silence_length: Duration of silence introduced between clean speech utterances.
 7 | # - total_hours: Total number of hours of data required. Units are in hours. 
 8 | # - snr_lower: Lower bound for SNR required (default: 0 dB)
 9 | # - snr_upper: Upper bound for SNR required (default: 40 dB)
10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB)
11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB)
12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper)
13 | # - clean_activity_threshold: Activity threshold for clean speech
14 | # - noise_activity_threshold: Activity threshold for noise
15 | # - fileindex_start: Starting file ID that will be used in filenames
16 | # - fileindex_end: Last file ID that will be used in filenames
17 | # - is_test_set: Set it to True if it is the test set, else False for the training set
18 | # - noise_dir: Specify the directory path to all noise files
19 | # - Speech_dir: Specify the directory path to all clean speech files
20 | # - noisy_destination: Specify path to the destination directory to store noisy speech
21 | # - clean_destination: Specify path to the destination directory to store clean speech
22 | # - noise_destination: Specify path to the destination directory to store noise speech
23 | # - log_dir: Specify path to the directory to store all the log files
24 | 
25 | # Configuration for unit tests
26 | # - snr_test: Set to True if SNR test is required, else False
27 | # - norm_test: Set to True if Normalization test is required, else False
28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False
29 | # - clipping_test: Set to True if Clipping test is required, else False
30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs
31 | 
32 | [noisy_speech]
33 | 
34 | sampling_rate: 16000
35 | audioformat: *.wav
36 | audio_length: 30
37 | silence_length: 0.2
38 | total_hours: 500
39 | snr_lower: -5 
40 | snr_upper: 20
41 | randomize_snr: True
42 | target_level_lower: -35
43 | target_level_upper: -15
44 | total_snrlevels: 21
45 | clean_activity_threshold: 0.6
46 | noise_activity_threshold: 0.0
47 | fileindex_start: None
48 | fileindex_end: None
49 | is_test_set: False
50 | 
51 | noise_dir: datasets\noise 
52 | speech_dir: datasets\clean\read_speech
53 | noise_types_excluded: None
54 | 
55 | noisy_destination: datasets\training_set_sept12\noisy
56 | clean_destination: datasets\training_set_sept12\clean
57 | noise_destination: datasets\training_set_sept12\noise
58 | log_dir: logs 
59 | 
60 | # Config: add singing voice to clean speech
61 | use_singing_data=1
62 | # 0 for no, 1 for yes
63 | clean_singing: datasets\clean\singing_voice
64 | #datasets\clean_singing\VocalSet11\FULL
65 | singing_choice: 3
66 | # 1 for only male, 2 for only female, 3 (default) for both male and female 
67 | 
68 | # Config: add emotional data to clean speech
69 | use_emotion_data=1
70 | # 0 for no, 1 for yes
71 | clean_emotion: datasets\clean\emotional_speech
72 | 
73 | # Config: add Chinese (mandarin) data to clean speech
74 | use_mandarin_data=1
75 | # 0 for no, 1 for yes
76 | clean_mandarin: datasets\clean\mandarin_speech
77 | 
78 | # Config: add reverb to clean speech
79 | rir_choice: 3
80 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic
81 | lower_t60: 0.3 
82 | # lower bound of t60 range in seconds
83 | upper_t60: 1.3 
84 | # upper bound of t60 range in seconds
85 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv
86 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv
87 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR
88 | 
89 | # Unit tests config
90 | snr_test: True
91 | norm_test: True
92 | sampling_rate_test = True
93 | clipping_test = True
94 | 
95 | unit_tests_log_dir: unittests_logs
96 | 


--------------------------------------------------------------------------------
/noisyspeech_synthesizer_singleprocess.py:
--------------------------------------------------------------------------------
  1 | """
  2 | @author: chkarada
  3 | """
  4 | 
  5 | # Note: This single process audio synthesizer will attempt to use each clean
  6 | # speech sourcefile once, as it does not randomly sample from these files
  7 | 
  8 | import os
  9 | import sys
 10 | import glob
 11 | import argparse
 12 | import ast
 13 | import configparser as CP
 14 | from random import shuffle
 15 | import random
 16 | 
 17 | import librosa
 18 | import numpy as np
 19 | from scipy import signal
 20 | from audiolib import audioread, audiowrite, segmental_snr_mixer, activitydetector, is_clipped, add_clipping
 21 | import utils
 22 | 
 23 | import pandas as pd
 24 | from pathlib import Path
 25 | from scipy.io import wavfile
 26 | 
 27 | MAXTRIES = 50
 28 | MAXFILELEN = 100
 29 | 
 30 | np.random.seed(5)
 31 | random.seed(5)
 32 | 
 33 | def add_pyreverb(clean_speech, rir):
 34 |     
 35 |     reverb_speech = signal.fftconvolve(clean_speech, rir, mode="full")
 36 |     
 37 |     # make reverb_speech same length as clean_speech
 38 |     reverb_speech = reverb_speech[0 : clean_speech.shape[0]]
 39 | 
 40 |     return reverb_speech
 41 | 
 42 | def build_audio(is_clean, params, index, audio_samples_length=-1):
 43 |     '''Construct an audio signal from source files'''
 44 | 
 45 |     fs_output = params['fs']
 46 |     silence_length = params['silence_length']
 47 |     if audio_samples_length == -1:
 48 |         audio_samples_length = int(params['audio_length']*params['fs'])
 49 | 
 50 |     output_audio = np.zeros(0)
 51 |     remaining_length = audio_samples_length
 52 |     files_used = []
 53 |     clipped_files = []
 54 | 
 55 |     if is_clean:
 56 |         source_files = params['cleanfilenames']
 57 |         idx = index
 58 |     else:
 59 |         if 'noisefilenames' in params.keys():
 60 |             source_files = params['noisefilenames']
 61 |             idx = index
 62 |         # if noise files are organized into individual subdirectories, pick a directory randomly
 63 |         else:
 64 |             noisedirs = params['noisedirs']
 65 |             # pick a noise category randomly
 66 |             idx_n_dir = np.random.randint(0, np.size(noisedirs))
 67 |             source_files = glob.glob(os.path.join(noisedirs[idx_n_dir], 
 68 |                                                   params['audioformat']))
 69 |             shuffle(source_files)
 70 |             # pick a noise source file index randomly
 71 |             idx = np.random.randint(0, np.size(source_files))
 72 | 
 73 |     # initialize silence
 74 |     silence = np.zeros(int(fs_output*silence_length))
 75 | 
 76 |     # iterate through multiple clips until we have a long enough signal
 77 |     tries_left = MAXTRIES
 78 |     while remaining_length > 0 and tries_left > 0:
 79 | 
 80 |         # read next audio file and resample if necessary
 81 | 
 82 |         idx = (idx + 1) % np.size(source_files)
 83 |         input_audio, fs_input = audioread(source_files[idx])
 84 |         if input_audio is None:
 85 |             sys.stderr.write("WARNING: Cannot read file: %s\n" % source_files[idx])
 86 |             continue
 87 |         if fs_input != fs_output:
 88 |             input_audio = librosa.resample(input_audio, fs_input, fs_output)
 89 | 
 90 |         # if current file is longer than remaining desired length, and this is
 91 |         # noise generation or this is training set, subsample it randomly
 92 |         if len(input_audio) > remaining_length and (not is_clean or not params['is_test_set']):
 93 |             idx_seg = np.random.randint(0, len(input_audio)-remaining_length)
 94 |             input_audio = input_audio[idx_seg:idx_seg+remaining_length]
 95 | 
 96 |         # check for clipping, and if found move onto next file
 97 |         if is_clipped(input_audio):
 98 |             clipped_files.append(source_files[idx])
 99 |             tries_left -= 1
100 |             continue
101 | 
102 |         # concatenate current input audio to output audio stream
103 |         files_used.append(source_files[idx])
104 |         output_audio = np.append(output_audio, input_audio)
105 |         remaining_length -= len(input_audio)
106 | 
107 |         # add some silence if we have not reached desired audio length
108 |         if remaining_length > 0:
109 |             silence_len = min(remaining_length, len(silence))
110 |             output_audio = np.append(output_audio, silence[:silence_len])
111 |             remaining_length -= silence_len
112 | 
113 |     if tries_left == 0 and not is_clean and 'noisedirs' in params.keys():
114 |         print("There are not enough non-clipped files in the " + noisedirs[idx_n_dir] + \
115 |               " directory to complete the audio build")
116 |         return [], [], clipped_files, idx
117 | 
118 |     return output_audio, files_used, clipped_files, idx
119 | 
120 | 
121 | def gen_audio(is_clean, params, index, audio_samples_length=-1):
122 |     '''Calls build_audio() to get an audio signal, and verify that it meets the
123 |        activity threshold'''
124 | 
125 |     clipped_files = []
126 |     low_activity_files = []
127 |     if audio_samples_length == -1:
128 |         audio_samples_length = int(params['audio_length']*params['fs'])
129 |     if is_clean:
130 |         activity_threshold = params['clean_activity_threshold']
131 |     else:
132 |         activity_threshold = params['noise_activity_threshold']
133 | 
134 |     while True:
135 |         audio, source_files, new_clipped_files, index = \
136 |             build_audio(is_clean, params, index, audio_samples_length)
137 | 
138 |         clipped_files += new_clipped_files
139 |         if len(audio) < audio_samples_length:
140 |             continue
141 | 
142 |         if activity_threshold == 0.0:
143 |             break
144 | 
145 |         percactive = activitydetector(audio=audio)
146 |         if percactive > activity_threshold:
147 |             break
148 |         else:
149 |             low_activity_files += source_files
150 | 
151 |     return audio, source_files, clipped_files, low_activity_files, index
152 | 
153 | 
154 | def main_gen(params):
155 |     '''Calls gen_audio() to generate the audio signals, verifies that they meet
156 |        the requirements, and writes the files to storage'''
157 | 
158 |     clean_source_files = []
159 |     clean_clipped_files = []
160 |     clean_low_activity_files = []
161 |     noise_source_files = []
162 |     noise_clipped_files = []
163 |     noise_low_activity_files = []
164 | 
165 |     clean_index = 0
166 |     noise_index = 0
167 |     file_num = params['fileindex_start']
168 | 
169 |     while file_num <= params['fileindex_end']:
170 |         # generate clean speech
171 |         clean, clean_sf, clean_cf, clean_laf, clean_index = \
172 |             gen_audio(True, params, clean_index)
173 | 
174 |         # add reverb with selected RIR
175 |         rir_index = random.randint(0,len(params['myrir'])-1)
176 |         
177 |         my_rir = os.path.normpath(os.path.join('datasets', 'impulse_responses', params['myrir'][rir_index]))
178 |         (fs_rir,samples_rir) = wavfile.read(my_rir)
179 | 
180 |         my_channel = int(params['mychannel'][rir_index])
181 |         
182 |         if samples_rir.ndim==1:
183 |             samples_rir_ch = np.array(samples_rir)
184 |             
185 |         elif my_channel > 1:
186 |             samples_rir_ch = samples_rir[:, my_channel -1]
187 |         else:
188 |             samples_rir_ch = samples_rir[:, my_channel -1]
189 |             #print(samples_rir.shape)
190 |             #print(my_channel)
191 | 
192 |         clean = add_pyreverb(clean, samples_rir_ch)
193 | 
194 |         # generate noise
195 |         noise, noise_sf, noise_cf, noise_laf, noise_index = \
196 |             gen_audio(False, params, noise_index, len(clean))
197 | 
198 |         clean_clipped_files += clean_cf
199 |         clean_low_activity_files += clean_laf
200 |         noise_clipped_files += noise_cf
201 |         noise_low_activity_files += noise_laf
202 | 
203 |         # get rir files and config
204 | 
205 |         # mix clean speech and noise
206 |         # if specified, use specified SNR value
207 |         if not params['randomize_snr']:
208 |             snr = params['snr']
209 |         # use a randomly sampled SNR value between the specified bounds
210 |         else:
211 |             snr = np.random.randint(params['snr_lower'], params['snr_upper'])
212 | 
213 |         clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params, 
214 |                                                                   clean=clean, 
215 |                                                                   noise=noise, 
216 |                                                                   snr=snr)
217 |         # Uncomment the below lines if you need segmental SNR and comment the above lines using snr_mixer
218 |         #clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params, 
219 |         #                                                         clean=clean, 
220 |         #                                                          noise=noise, 
221 |         #                                                         snr=snr)
222 |         # unexpected clipping
223 |         if is_clipped(clean_snr) or is_clipped(noise_snr) or is_clipped(noisy_snr):
224 |             print("Warning: File #" + str(file_num) + " has unexpected clipping, " + \
225 |                   "returning without writing audio to disk")
226 |             continue
227 | 
228 |         clean_source_files += clean_sf
229 |         noise_source_files += noise_sf
230 | 
231 |         # write resultant audio streams to files
232 |         hyphen = '-'
233 |         clean_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in clean_sf]
234 |         clean_files_joined = hyphen.join(clean_source_filenamesonly)[:MAXFILELEN]
235 |         noise_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in noise_sf]
236 |         noise_files_joined = hyphen.join(noise_source_filenamesonly)[:MAXFILELEN]
237 | 
238 |         noisyfilename = clean_files_joined + '_' + noise_files_joined + '_snr' + \
239 |                         str(snr) + '_tl' + str(target_level) + '_fileid_' + str(file_num) + '.wav'
240 |         cleanfilename = 'clean_fileid_'+str(file_num)+'.wav'
241 |         noisefilename = 'noise_fileid_'+str(file_num)+'.wav'
242 | 
243 |         noisypath = os.path.join(params['noisyspeech_dir'], noisyfilename)
244 |         cleanpath = os.path.join(params['clean_proc_dir'], cleanfilename)
245 |         noisepath = os.path.join(params['noise_proc_dir'], noisefilename)
246 | 
247 |         audio_signals = [noisy_snr, clean_snr, noise_snr]
248 |         file_paths = [noisypath, cleanpath, noisepath]
249 | 
250 |         file_num += 1
251 |         for i in range(len(audio_signals)):
252 |             try:
253 |                 audiowrite(file_paths[i], audio_signals[i], params['fs'])
254 |             except Exception as e:
255 |                 print(str(e))
256 | 
257 | 
258 |     return clean_source_files, clean_clipped_files, clean_low_activity_files, \
259 |            noise_source_files, noise_clipped_files, noise_low_activity_files
260 | 
261 | 
262 | def main_body():
263 |     '''Main body of this file'''
264 | 
265 |     parser = argparse.ArgumentParser()
266 | 
267 |     # Configurations: read noisyspeech_synthesizer.cfg and gather inputs
268 |     parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg',
269 |                         help='Read noisyspeech_synthesizer.cfg for all the details')
270 |     parser.add_argument('--cfg_str', type=str, default='noisy_speech')
271 |     args = parser.parse_args()
272 | 
273 |     params = dict()
274 |     params['args'] = args
275 |     cfgpath = os.path.join(os.path.dirname(__file__), args.cfg)
276 |     assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]'
277 | 
278 |     cfg = CP.ConfigParser()
279 |     cfg._interpolation = CP.ExtendedInterpolation()
280 |     cfg.read(cfgpath)
281 |     params['cfg'] = cfg._sections[args.cfg_str]
282 |     cfg = params['cfg']
283 | 
284 |     clean_dir = os.path.join(os.path.dirname(__file__), 'datasets/clean')
285 | 
286 |     if cfg['speech_dir'] != 'None':
287 |         clean_dir = cfg['speech_dir']
288 |     if not os.path.exists(clean_dir):
289 |         assert False, ('Clean speech data is required')
290 | 
291 |     noise_dir = os.path.join(os.path.dirname(__file__), 'datasets/noise')
292 | 
293 |     if cfg['noise_dir'] != 'None':
294 |         noise_dir = cfg['noise_dir']
295 |     if not os.path.exists:
296 |         assert False, ('Noise data is required')
297 | 
298 |     params['fs'] = int(cfg['sampling_rate'])
299 |     params['audioformat'] = cfg['audioformat']
300 |     params['audio_length'] = float(cfg['audio_length'])
301 |     params['silence_length'] = float(cfg['silence_length'])
302 |     params['total_hours'] = float(cfg['total_hours'])
303 |     
304 |     # clean singing speech
305 |     params['use_singing_data'] = int(cfg['use_singing_data'])
306 |     params['clean_singing'] = str(cfg['clean_singing'])
307 |     params['singing_choice'] = int(cfg['singing_choice'])
308 | 
309 |     # clean emotional speech
310 |     params['use_emotion_data'] = int(cfg['use_emotion_data'])
311 |     params['clean_emotion'] = str(cfg['clean_emotion'])
312 |     
313 |     # clean mandarin speech
314 |     params['use_mandarin_data'] = int(cfg['use_mandarin_data'])
315 |     params['clean_mandarin'] = str(cfg['clean_mandarin'])
316 |     
317 |     # rir
318 |     params['rir_choice'] = int(cfg['rir_choice'])
319 |     params['lower_t60'] = float(cfg['lower_t60'])
320 |     params['upper_t60'] = float(cfg['upper_t60'])
321 |     params['rir_table_csv'] = str(cfg['rir_table_csv'])
322 |     params['clean_speech_t60_csv'] = str(cfg['clean_speech_t60_csv'])
323 | 
324 |     if cfg['fileindex_start'] != 'None' and cfg['fileindex_end'] != 'None':
325 |         params['num_files'] = int(cfg['fileindex_end'])-int(cfg['fileindex_start'])
326 |         params['fileindex_start'] = int(cfg['fileindex_start'])
327 |         params['fileindex_end'] = int(cfg['fileindex_end'])
328 |     else:
329 |         params['num_files'] = int((params['total_hours']*60*60)/params['audio_length'])
330 |         params['fileindex_start'] = 0
331 |         params['fileindex_end'] = params['num_files']
332 | 
333 |     print('Number of files to be synthesized:', params['num_files'])
334 |     
335 |     params['is_test_set'] = utils.str2bool(cfg['is_test_set'])
336 |     params['clean_activity_threshold'] = float(cfg['clean_activity_threshold'])
337 |     params['noise_activity_threshold'] = float(cfg['noise_activity_threshold'])
338 |     params['snr_lower'] = int(cfg['snr_lower'])
339 |     params['snr_upper'] = int(cfg['snr_upper'])
340 |     
341 |     params['randomize_snr'] = utils.str2bool(cfg['randomize_snr'])
342 |     params['target_level_lower'] = int(cfg['target_level_lower'])
343 |     params['target_level_upper'] = int(cfg['target_level_upper'])
344 |     
345 |     if 'snr' in cfg.keys():
346 |         params['snr'] = int(cfg['snr'])
347 |     else:
348 |         params['snr'] = int((params['snr_lower'] + params['snr_upper'])/2)
349 | 
350 |     params['noisyspeech_dir'] = utils.get_dir(cfg, 'noisy_destination', 'noisy')
351 |     params['clean_proc_dir'] = utils.get_dir(cfg, 'clean_destination', 'clean')
352 |     params['noise_proc_dir'] = utils.get_dir(cfg, 'noise_destination', 'noise')
353 | 
354 |     if 'speech_csv' in cfg.keys() and cfg['speech_csv'] != 'None':
355 |         cleanfilenames = pd.read_csv(cfg['speech_csv'])
356 |         cleanfilenames = cleanfilenames['filename']
357 |     else:
358 |         #cleanfilenames = glob.glob(os.path.join(clean_dir, params['audioformat']))
359 |         cleanfilenames= []
360 |         for path in Path(clean_dir).rglob('*.wav'):
361 |             cleanfilenames.append(str(path.resolve()))
362 | 
363 |     shuffle(cleanfilenames)
364 | #   add singing voice to clean speech
365 |     if params['use_singing_data'] ==1:
366 |         all_singing= []
367 |         for path in Path(params['clean_singing']).rglob('*.wav'):
368 |             all_singing.append(str(path.resolve()))
369 |             
370 |         if params['singing_choice']==1: # male speakers
371 |             mysinging = [s for s in all_singing if ("male" in s and "female" not in s)]
372 |     
373 |         elif params['singing_choice']==2: # female speakers
374 |             mysinging = [s for s in all_singing if "female" in s]
375 |     
376 |         elif params['singing_choice']==3: # both male and female
377 |             mysinging = all_singing
378 |         else: # default both male and female
379 |             mysinging = all_singing
380 |             
381 |         shuffle(mysinging)
382 |         if mysinging is not None:
383 |             all_cleanfiles= cleanfilenames + mysinging
384 |     else: 
385 |         all_cleanfiles= cleanfilenames
386 |         
387 | #   add emotion data to clean speech
388 |     if params['use_emotion_data'] ==1:
389 |         all_emotion= []
390 |         for path in Path(params['clean_emotion']).rglob('*.wav'):
391 |             all_emotion.append(str(path.resolve()))
392 | 
393 |         shuffle(all_emotion)
394 |         if all_emotion is not None:
395 |             all_cleanfiles = all_cleanfiles + all_emotion
396 |     else: 
397 |         print('NOT using emotion data for training!')    
398 |         
399 | #   add mandarin data to clean speech
400 |     if params['use_mandarin_data'] ==1:
401 |         all_mandarin= []
402 |         for path in Path(params['clean_mandarin']).rglob('*.wav'):
403 |             all_mandarin.append(str(path.resolve()))
404 | 
405 |         shuffle(all_mandarin)
406 |         if all_mandarin is not None:
407 |             all_cleanfiles = all_cleanfiles + all_mandarin
408 |     else: 
409 |         print('NOT using non-english (Mandarin) data for training!')           
410 |         
411 | 
412 |     params['cleanfilenames'] = all_cleanfiles
413 |     params['num_cleanfiles'] = len(params['cleanfilenames'])
414 |     # If there are .wav files in noise_dir directory, use those
415 |     # If not, that implies that the noise files are organized into subdirectories by type,
416 |     # so get the names of the non-excluded subdirectories
417 |     if 'noise_csv' in cfg.keys() and cfg['noise_csv'] != 'None':
418 |         noisefilenames = pd.read_csv(cfg['noise_csv'])
419 |         noisefilenames = noisefilenames['filename']
420 |     else:
421 |         noisefilenames = glob.glob(os.path.join(noise_dir, params['audioformat']))
422 | 
423 |     if len(noisefilenames)!=0:
424 |         shuffle(noisefilenames)
425 |         params['noisefilenames'] = noisefilenames
426 |     else:
427 |         noisedirs = glob.glob(os.path.join(noise_dir, '*'))
428 |         if cfg['noise_types_excluded'] != 'None':
429 |             dirstoexclude = cfg['noise_types_excluded'].split(',')
430 |             for dirs in dirstoexclude:
431 |                 noisedirs.remove(dirs)
432 |         shuffle(noisedirs)
433 |         params['noisedirs'] = noisedirs
434 | 
435 |     # rir 
436 |     temp = pd.read_csv(params['rir_table_csv'], skiprows=[1], sep=',', header=None,  names=['wavfile','channel','T60_WB','C50_WB','isRealRIR'])
437 |     temp.keys()
438 |     #temp.wavfile
439 | 
440 |     rir_wav = temp['wavfile'][1:] # 115413
441 |     rir_channel = temp['channel'][1:] 
442 |     rir_t60 = temp['T60_WB'][1:] 
443 |     rir_isreal= temp['isRealRIR'][1:]  
444 | 
445 |     rir_wav2 = [w.replace('\\', '/') for w in rir_wav]
446 |     rir_channel2 = [w for w in rir_channel]
447 |     rir_t60_2 = [w for w in rir_t60]
448 |     rir_isreal2= [w for w in rir_isreal]
449 |     
450 |     myrir =[]
451 |     mychannel=[]
452 |     myt60=[]
453 | 
454 |     lower_t60=  params['lower_t60']
455 |     upper_t60=  params['upper_t60']
456 | 
457 |     if params['rir_choice']==1: # real 3076 IRs
458 |         real_indices= [i for i, x in enumerate(rir_isreal2) if x == "1"]
459 | 
460 |         chosen_i = []
461 |         for i in real_indices:
462 |             if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
463 |                 chosen_i.append(i)
464 | 
465 |         myrir= [rir_wav2[i] for i in chosen_i]
466 |         mychannel = [rir_channel2[i] for i in chosen_i]
467 |         myt60 = [rir_t60_2[i] for i in chosen_i]
468 | 
469 | 
470 |     elif params['rir_choice']==2: # synthetic 112337 IRs
471 |         synthetic_indices= [i for i, x in enumerate(rir_isreal2) if x == "0"]
472 | 
473 |         chosen_i = []
474 |         for i in synthetic_indices:
475 |             if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
476 |                 chosen_i.append(i)
477 | 
478 |         myrir= [rir_wav2[i] for i in chosen_i]
479 |         mychannel = [rir_channel2[i] for i in chosen_i]
480 |         myt60 = [rir_t60_2[i] for i in chosen_i]
481 | 
482 |     elif params['rir_choice']==3: # both real and synthetic
483 |         all_indices= [i for i, x in enumerate(rir_isreal2)]
484 | 
485 |         chosen_i = []
486 |         for i in all_indices:
487 |             if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
488 |                 chosen_i.append(i)
489 | 
490 |         myrir= [rir_wav2[i] for i in chosen_i]
491 |         mychannel = [rir_channel2[i] for i in chosen_i]
492 |         myt60 = [rir_t60_2[i] for i in chosen_i]
493 | 
494 |     else:  # default both real and synthetic
495 |         all_indices= [i for i, x in enumerate(rir_isreal2)]
496 | 
497 |         chosen_i = []
498 |         for i in all_indices:
499 |             if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
500 |                 chosen_i.append(i)
501 | 
502 |         myrir= [rir_wav2[i] for i in chosen_i]
503 |         mychannel = [rir_channel2[i] for i in chosen_i]
504 |         myt60 = [rir_t60_2[i] for i in chosen_i]
505 | 
506 |     params['myrir'] = myrir
507 |     params['mychannel'] = mychannel
508 |     params['myt60'] = myt60
509 | 
510 |     # Call main_gen() to generate audio
511 |     clean_source_files, clean_clipped_files, clean_low_activity_files, \
512 |     noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
513 | 
514 |     # Create log directory if needed, and write log files of clipped and low activity files
515 |     log_dir = utils.get_dir(cfg, 'log_dir', 'Logs')
516 | 
517 |     utils.write_log_file(log_dir, 'source_files.csv', clean_source_files + noise_source_files)
518 |     utils.write_log_file(log_dir, 'clipped_files.csv', clean_clipped_files + noise_clipped_files)
519 |     utils.write_log_file(log_dir, 'low_activity_files.csv', \
520 |                          clean_low_activity_files + noise_low_activity_files)
521 | 
522 |     # Compute and print stats about percentange of clipped and low activity files
523 |     total_clean = len(clean_source_files) + len(clean_clipped_files) + len(clean_low_activity_files)
524 |     total_noise = len(noise_source_files) + len(noise_clipped_files) + len(noise_low_activity_files)
525 |     pct_clean_clipped = round(len(clean_clipped_files)/total_clean*100, 1)
526 |     pct_noise_clipped = round(len(noise_clipped_files)/total_noise*100, 1)
527 |     pct_clean_low_activity = round(len(clean_low_activity_files)/total_clean*100, 1)
528 |     pct_noise_low_activity = round(len(noise_low_activity_files)/total_noise*100, 1)
529 | 
530 |     print("Of the " + str(total_clean) + " clean speech files analyzed, " + \
531 |           str(pct_clean_clipped) + "% had clipping, and " + str(pct_clean_low_activity) + \
532 |           "% had low activity " + "(below " + str(params['clean_activity_threshold']*100) + \
533 |           "% active percentage)")
534 |     print("Of the " + str(total_noise) + " noise files analyzed, " + str(pct_noise_clipped) + \
535 |           "% had clipping, and " + str(pct_noise_low_activity) + "% had low activity " + \
536 |           "(below " + str(params['noise_activity_threshold']*100) + "% active percentage)")
537 | 
538 | 
539 | if __name__ == '__main__':
540 | 
541 |     main_body()
542 | 


--------------------------------------------------------------------------------
/pdns_synthesizer_icassp2023.cfg:
--------------------------------------------------------------------------------
  1 | # Configuration for generating Noisy Speech Dataset
  2 | 
  3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz
  4 | # - audioformat: default is .wav
  5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances. 
  6 | # - silence_length: Duration of silence introduced between clean speech utterances.
  7 | # - total_hours: Total number of hours of data required. Units are in hours. 
  8 | # - snr_lower: Lower bound for SNR required (default: 0 dB)
  9 | # - snr_upper: Upper bound for SNR required (default: 40 dB)
 10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB)
 11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB)
 12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper)
 13 | # - clean_activity_threshold: Activity threshold for clean speech
 14 | # - noise_activity_threshold: Activity threshold for noise
 15 | # - fileindex_start: Starting file ID that will be used in filenames
 16 | # - fileindex_end: Last file ID that will be used in filenames
 17 | # - is_test_set: Set it to True if it is the test set, else False for the training set
 18 | # - noise_dir: Specify the directory path to all noise files
 19 | # - Speech_dir: Specify the directory path to all clean speech files
 20 | # - noisy_destination: Specify path to the destination directory to store noisy speech
 21 | # - clean_destination: Specify path to the destination directory to store clean speech
 22 | # - noise_destination: Specify path to the destination directory to store noise speech
 23 | # - log_dir: Specify path to the directory to store all the log files
 24 | 
 25 | # Configuration for unit tests
 26 | # - snr_test: Set to True if SNR test is required, else False
 27 | # - norm_test: Set to True if Normalization test is required, else False
 28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False
 29 | # - clipping_test: Set to True if Clipping test is required, else False
 30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs
 31 | 
 32 | [noisy_speech]
 33 | 
 34 | sampling_rate: 48000
 35 | audioformat: *.wav
 36 | audio_length: 30
 37 | # 15, 12, 30 
 38 | silence_length: 0.2
 39 | total_hours: 1000
 40 | # 1000
 41 | #200
 42 | # 2.5, 500, 100
 43 | snr_lower: -5
 44 | #-5, 0
 45 | snr_upper: 20
 46 | # 25, 40
 47 | randomize_snr: True
 48 | target_level_lower: -35
 49 | target_level_upper: -15
 50 | total_snrlevels: 31 
 51 | # 5 
 52 | clean_activity_threshold: 0.0
 53 | noise_activity_threshold: 0.2
 54 | fileindex_start: None
 55 | fileindex_end: None
 56 | is_test_set: False
 57 | # True, False
 58 | 
 59 | noise_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/noise
 60 | #/mnt/f/4th_DNSChallenge/INTERSPEECH_2021/DNS-Challenge/datasets_fullband/noise
 61 | #F:\4th_DNSChallenge\INTERSPEECH_2021\DNS-Challenge\datasets_fullband\noise
 62 | #datasets\pdns_training_set\noise
 63 | #\test_set2\Test_Noise
 64 | # datasets\noise 
 65 | # \datasets\noise
 66 | 
 67 | speech_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean
 68 | # D:\kanhawin_git\primary_speakers_VCTK_16k_for_synthesizer
 69 | # datasets\test_set2\Singing_Voice\wav_16k
 70 | # dir with secondary speaker clean speech
 71 | speech_dir2: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean
 72 | #D:\kanhawin_git\secondary_speakers_voxCeleb2_16k
 73 | # datasets\test_set2\Singing_Voice\wav_16k
 74 | 
 75 | spkid_csv: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/complete_ps_split.csv
 76 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/vctk_spkid.csv
 77 | # datasets\clean 
 78 | noise_types_excluded: None
 79 | 
 80 | rir_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/impulse_responses
 81 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/impulse_responses
 82 | # F:\4th_DNSChallenge\ICASSP_2022\DNS-Challenge\datasets\impulse_responses
 83 | 
 84 | # \datasets\clean
 85 | noisy_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noisy
 86 | # datasets/training_data/noisy
 87 | # datasets\test_set2\synthetic_personalizeddns\noisy
 88 | #training_set2_onlyrealrir\noisy 
 89 | #\noisy
 90 | clean_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/clean
 91 | #datasets\test_set2\synthetic_personalizeddns\clean
 92 | # training_set2_onlyrealrir\clean 
 93 | # \clean
 94 | noise_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noise
 95 | # datasets/training_data/noise
 96 | #datasets\test_set2\synthetic_personalizeddns\noise
 97 | #training_set2_onlyrealrir\noise 
 98 | # \noise
 99 | log_dir: logs 
100 | # \logs
101 | 
102 | # Config: add singing voice to clean speech
103 | clean_singing: datasets\clean_singing\VocalSet11\FULL
104 | singing_choice: 3
105 | # 1 for only male, 2 for only female, 3 (default) for both male and female 
106 | 
107 | # Config: add reverb to clean speech
108 | rir_choice: 1
109 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic
110 | lower_t60: 0.3 
111 | # lower bound of t60 range in seconds
112 | upper_t60: 1.3 
113 | # upper bound of t60 range in seconds
114 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv
115 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv
116 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR
117 | 
118 | # pdns testsets
119 | # primary_data: D:\kanhawin_git\primary_speakers_VCTK_16k
120 | #'D:\PersonalizedDNS_dataset\synthetic_primary'
121 | # secondary_data='D:\kanhawin_git\secondary_speakers_voxCeleb2_16k'
122 | #'D:\PersonalizedDNS_dataset\synthetic_secondary'
123 | # noise_data= datasets\test_set2\synthetic\noise
124 | # pdns_testset_clean= datasets\test_set2\pdns\clean
125 | # pdns_testset_noisy= datasets\test_set2\pdns\noisy
126 | 
127 | # adaptation_data_seconds=120
128 | # num_primary_spk=100
129 | # num_clips=600
130 | 
131 | # Unit tests config
132 | snr_test: True
133 | norm_test: True
134 | sampling_rate_test = True
135 | clipping_test = True
136 | 
137 | unit_tests_log_dir: unittests_logs
138 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy==1.22.4
 2 | soundfile==0.9.0
 3 | librosa==0.8.1
 4 | argparse==1.1
 5 | configparser==5.3.0
 6 | pandas==1.2.4
 7 | logging==0.4.9.6
 8 | onnxruntime==1.13.1
 9 | torch==1.10.0
10 | torchvision==0.11.1
11 | torchaudio==0.10.0
12 | 


--------------------------------------------------------------------------------
/unit_tests_synthesizer.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import soundfile as sf
  3 | import glob
  4 | import argparse
  5 | import os
  6 | import utils
  7 | import configparser as CP
  8 | 
  9 | LOW_ENERGY_THRESH = -60
 10 | 
 11 | def test_snr(clean, noise, expected_snr, snrtolerance=2):
 12 |     '''Test for SNR
 13 |     Note: It is not applicable for Segmental SNR'''
 14 |     rmsclean = (clean**2).mean()**0.5
 15 |     rmsnoise = (noise**2).mean()**0.5
 16 |     actual_snr = 20*np.log10(rmsclean/rmsnoise)
 17 |     return actual_snr > (expected_snr-snrtolerance) and actual_snr < (expected_snr+snrtolerance)
 18 | 
 19 | def test_normalization(audio, expected_rms=-25, normtolerance=2):
 20 |     '''Test for Normalization
 21 |     Note: Set it to False if different target levels are used'''
 22 |     rmsaudio = (audio**2).mean()**0.5
 23 |     rmsaudiodb = 20*np.log10(rmsaudio)
 24 |     return rmsaudiodb > (expected_rms-normtolerance) and rmsaudiodb < (expected_rms+normtolerance)
 25 | 
 26 | def test_samplingrate(sr, expected_sr=16000):
 27 |     '''Test to ensure all clips have same sampling rate'''
 28 |     return expected_sr == sr
 29 | 
 30 | def test_clipping(audio, num_consecutive_samples=3, clipping_threshold=0.01):
 31 |     '''Test to detect clipping'''
 32 |     clipping = False
 33 |     for i in range(0, len(audio)-num_consecutive_samples-1):
 34 |         audioseg = audio[i:i+num_consecutive_samples]
 35 |         if abs(max(audioseg)-min(audioseg)) < clipping_threshold or abs(max(audioseg)) >= 1:
 36 |             clipping = True
 37 |             break
 38 |     return clipping
 39 | 
 40 | def test_zeros_beg_end(audio, num_zeros=16000, low_energy_thresh=LOW_ENERGY_THRESH):
 41 |     '''Test if there are zeros in the beginning and the end of the signal'''
 42 |     beg_segment_energy = 20*np.log10(audio[:num_zeros]**2).mean()**0.5
 43 |     end_segment_energy = 20*np.log10(audio[-num_zeros:]**2).mean()**0.5
 44 |     return beg_segment_energy < low_energy_thresh or end_segment_energy < low_energy_thresh
 45 | 
 46 | def adsp_filtering_test(adsp, without_adsp):
 47 |     diff = adsp - without_adsp
 48 |     if any(val >0.0001 for val in diff):
 49 |         
 50 |     
 51 | if __name__=='__main__':
 52 |     parser = argparse.ArgumentParser()
 53 |     parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg')
 54 |     parser.add_argument('--cfg_str', type=str, default='noisy_speech')
 55 | 
 56 |     args = parser.parse_args()
 57 |     
 58 |     cfgpath = os.path.join(os.path.dirname(__file__), args.cfg)
 59 |     assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]'
 60 | 
 61 |     cfg = CP.ConfigParser()
 62 |     cfg._interpolation = CP.ExtendedInterpolation()
 63 |     cfg.read(cfgpath)
 64 |     cfg = cfg._sections[args.cfg_str]
 65 |     
 66 |     noisydir = cfg['noisy_train']
 67 |     cleandir = cfg['clean_train']
 68 |     noisedir = cfg['noise_train']
 69 |     audioformat = cfg['audioformat']
 70 |     
 71 |     # List of noisy speech files
 72 |     noisy_speech_filenames_big = glob.glob(os.path.join(noisydir, audioformat))
 73 |     noisy_speech_filenames = noisy_speech_filenames_big[0:10]
 74 |     # Initialize the lists
 75 |     noisy_filenames_list = []
 76 |     clean_filenames_list = []
 77 |     noise_filenames_list = []
 78 |     snr_results_list =[]
 79 |     clean_norm_results_list = []
 80 |     noise_norm_results_list = []
 81 |     noisy_norm_results_list = []
 82 |     clean_sr_results_list = []
 83 |     noise_sr_results_list = []
 84 |     noisy_sr_results_list = []
 85 |     clean_clipping_results_list = []
 86 |     noise_clipping_results_list = []
 87 |     noisy_clipping_results_list = []
 88 | 
 89 |     skipped_string = 'Skipped'
 90 |     # Initialize the counters for stats
 91 |     total_clips = len(noisy_speech_filenames)
 92 | 
 93 | 
 94 |     for noisypath in noisy_speech_filenames:
 95 |         # To do: add right paths to clean filename and noise filename
 96 |         noisy_filename = os.path.basename(noisypath)
 97 |         clean_filename = 'clean_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav'
 98 |         cleanpath = os.path.join(cleandir, clean_filename)
 99 |         noise_filename = 'noise_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav'
100 |         noisepath = os.path.join(noisedir, noise_filename)
101 | 
102 |         noisy_filenames_list.append(noisy_filename)
103 |         clean_filenames_list.append(clean_filename)
104 |         noise_filenames_list.append(noise_filename)
105 | 
106 |         # Read clean, noise and noisy signals
107 |         clean_signal, fs_clean = sf.read(cleanpath)
108 |         noise_signal, fs_noise = sf.read(noisepath)
109 |         noisy_signal, fs_noisy = sf.read(noisypath)
110 | 
111 |         # SNR Test
112 |         # To do: add right path split to extract SNR
113 |         if utils.str2bool(cfg['snr_test']):
114 |             snr = int(noisy_filename.split('_snr')[1].split('_')[0])
115 |             snr_results_list.append(str(test_snr(clean=clean_signal, \
116 |              noise=noise_signal, expected_snr=snr)))
117 |         else:
118 |             snr_results_list.append(skipped_string)
119 |         
120 |         # Normalization test
121 |         if utils.str2bool(cfg['norm_test']):
122 |             tl = int(noisy_filename.split('_tl')[1].split('_')[0])
123 |             clean_norm_results_list.append(str(test_normalization(clean_signal)))
124 |             noise_norm_results_list.append(str(test_normalization(noise_signal)))
125 |             noisy_norm_results_list.append(str(test_normalization(noisy_signal, expected_rms=tl)))
126 |         else:
127 |             clean_norm_results_list.append(skipped_string)
128 |             noise_norm_results_list.append(skipped_string)
129 |             noisy_norm_results_list.append(skipped_string)
130 |         
131 |         # Sampling rate test
132 |         if utils.str2bool(cfg['sampling_rate_test']):
133 |             clean_sr_results_list.append(str(test_samplingrate(sr=fs_clean)))
134 |             noise_sr_results_list.append(str(test_samplingrate(sr=fs_noise)))
135 |             noisy_sr_results_list.append(str(test_samplingrate(sr=fs_noisy)))
136 |         else:
137 |             clean_sr_results_list.append(skipped_string)
138 |             noise_sr_results_list.append(skipped_string)
139 |             noisy_sr_results_list.append(skipped_string)
140 |         
141 |         # Clipping test
142 |         if utils.str2bool(cfg['clipping_test']):
143 |             clean_clipping_results_list.append(str(test_clipping(audio=clean_signal)))
144 |             noise_clipping_results_list.append(str(test_clipping(audio=noise_signal)))
145 |             noisy_clipping_results_list.append(str(test_clipping(audio=noisy_signal)))
146 |         else:
147 |             clean_clipping_results_list.append(skipped_string)
148 |             noise_clipping_results_list.append(skipped_string)
149 |             noisy_clipping_results_list.append(skipped_string)
150 | 
151 |     # Stats
152 |     pc_snr_passed = round(snr_results_list.count('True')/total_clips*100, 1)
153 |     pc_clean_norm_passed = round(clean_norm_results_list.count('True')/total_clips*100, 1)
154 |     pc_noise_norm_passed = round(noise_norm_results_list.count('True')/total_clips*100, 1)
155 |     pc_noisy_norm_passed = round(noisy_norm_results_list.count('True')/total_clips*100, 1)
156 |     pc_clean_sr_passed = round(clean_sr_results_list.count('True')/total_clips*100, 1)
157 |     pc_noise_sr_passed = round(noise_sr_results_list.count('True')/total_clips*100, 1)
158 |     pc_noisy_sr_passed = round(noisy_sr_results_list.count('True')/total_clips*100, 1)
159 |     pc_clean_clipping_passed = round(clean_clipping_results_list.count('True')/total_clips*100, 1)
160 |     pc_noise_clipping_passed = round(noise_clipping_results_list.count('True')/total_clips*100, 1)
161 |     pc_noisy_clipping_passed = round(noisy_clipping_results_list.count('True')/total_clips*100, 1)
162 | 
163 |     print('% clips that passed SNR test:', pc_snr_passed)
164 |     
165 |     print('% clean clips that passed Normalization tests:', pc_clean_norm_passed)
166 |     print('% noise clips that passed Normalization tests:', pc_noise_norm_passed)
167 |     print('% noisy clips that passed Normalization tests:', pc_noisy_norm_passed)
168 | 
169 |     print('% clean clips that passed Sampling Rate tests:', pc_clean_sr_passed)
170 |     print('% noise clips that passed Sampling Rate tests:', pc_noise_sr_passed)
171 |     print('% noisy clips that passed Sampling Rate tests:', pc_noisy_sr_passed)
172 | 
173 |     print('% clean clips that passed Clipping tests:', pc_clean_clipping_passed)
174 |     print('% noise clips that passed Clipping tests:', pc_noise_clipping_passed)
175 |     print('% noisy clips that passed Clipping tests:', pc_noisy_clipping_passed)
176 | 
177 |     log_dir = utils.get_dir(cfg, 'unit_tests_log_dir', 'Unit_tests_logs')
178 |     
179 |     if not os.path.exists(log_dir):
180 |         log_dir = os.path.join(os.path.dirname(__file__), 'Unit_tests_logs')
181 |         os.makedirs(log_dir)
182 |     
183 |     utils.write_log_file(log_dir, 'unit_test_results.csv', [noisy_filenames_list, clean_filenames_list, \
184 |                             noise_filenames_list, snr_results_list, clean_norm_results_list, noise_norm_results_list, \
185 |                             noisy_norm_results_list, clean_sr_results_list, noise_sr_results_list, noisy_sr_results_list, \
186 |                             clean_clipping_results_list, noise_clipping_results_list, noisy_clipping_results_list])


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | """
 3 | Created on Fri Nov  1 10:28:41 2019
 4 | 
 5 | @author: rocheng
 6 | """
 7 | import os
 8 | import csv
 9 | from shutil import copyfile
10 | import glob
11 | 
12 | def get_dir(cfg, param_name, new_dir_name):
13 |     '''Helper function to retrieve directory name if it exists,
14 |        create it if it doesn't exist'''
15 | 
16 |     if param_name in cfg:
17 |         dir_name = cfg[param_name]
18 |     else:
19 |         dir_name = os.path.join(os.path.dirname(__file__), new_dir_name)
20 |     if not os.path.exists(dir_name):
21 |         os.makedirs(dir_name)
22 |     return dir_name
23 | 
24 | 
25 | def write_log_file(log_dir, log_filename, data):
26 |     '''Helper function to write log file'''
27 |     data = zip(*data)
28 |     with open(os.path.join(log_dir, log_filename), mode='w', newline='') as csvfile:
29 |         csvwriter = csv.writer(csvfile, delimiter=' ',
30 |                                quotechar='|', quoting=csv.QUOTE_MINIMAL)
31 |         for row in data:
32 |             csvwriter.writerow([row])
33 | 
34 | 
35 | def str2bool(string):
36 |     return string.lower() in ("yes", "true", "t", "1")
37 | 
38 | 
39 | def rename_copyfile(src_path, dest_dir, prefix='', ext='*.wav'):
40 |     srcfiles = glob.glob(f"{src_path}/"+ext)
41 |     for i in range(len(srcfiles)):
42 |         dest_path = os.path.join(dest_dir, prefix+'_'+os.path.basename(srcfiles[i]))
43 |         copyfile(srcfiles[i], dest_path)
44 | 
45 | 
46 | 
47 | 


--------------------------------------------------------------------------------