├── .gitignore
├── CODE_OF_CONDUCT.md
├── DNSMOS
├── DNSMOS
│ ├── bak_ovr.onnx
│ ├── model_v8.onnx
│ ├── sig.onnx
│ └── sig_bak_ovr.onnx
├── README.md
├── dnsmos_local.py
└── pDNSMOS
│ └── sig_bak_ovr.onnx
├── LICENSE
├── LICENSE-CODE
├── README-DNS3.md
├── README.md
├── SECURITY.md
├── V5_DNS_Challenge_FinalResults.pdf
├── WAcc
└── WAcc.py
├── audiolib.py
├── docs
├── CMT Instructions for uploading enhanced clips_ICASSP2022.pdf
├── ICASSP_2021_DNS_challenge.pdf
└── ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf
├── download-dns-challenge-1.sh
├── download-dns-challenge-2.sh
├── download-dns-challenge-3.sh
├── download-dns-challenge-4-pdns.sh
├── download-dns-challenge-4.sh
├── download-dns-challenge-5-baseline.sh
├── download-dns-challenge-5-filelists-headset.sh
├── download-dns-challenge-5-filelists-speakerphone.sh
├── download-dns-challenge-5-headset-training.sh
├── download-dns-challenge-5-noise-ir.sh
├── download-dns-challenge-5-paralinguistic-train.sh
├── download-dns-challenge-5-speakerphone-training.sh
├── download-dns5-blind-testset.sh
├── download-dns5-dev-testset.sh
├── download_dns_v2_v3_blindset.sh
├── index.html
├── noisyspeech_synthesizer.cfg
├── noisyspeech_synthesizer_singleprocess.py
├── pdns_noisyspeech_synthesizer_singleprocess.py
├── pdns_synthesizer_icassp2023.cfg
├── requirements.txt
├── unit_tests_synthesizer.py
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | datasets/
2 | datasets_fullband/
3 | training_set/
4 | training_set2/
5 | training_set2_onlyrealrir/
6 | training_set4/
7 | training_set5/
8 | logs/
9 | test_set2/
10 | training_set_sept11/
11 | training_set_sept12/
12 | __pycache__/
13 | *.pyc
14 | *~
15 | /.vs/
16 | /.vscode/
17 | *.wav
18 | *.tar.bz2
19 | *.zip
20 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Microsoft Open Source Code of Conduct
2 |
3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4 |
5 | Resources:
6 |
7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 |
--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/bak_ovr.onnx
--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/model_v8.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/model_v8.onnx
--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/sig.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig.onnx
--------------------------------------------------------------------------------
/DNSMOS/DNSMOS/sig_bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/DNSMOS/sig_bak_ovr.onnx
--------------------------------------------------------------------------------
/DNSMOS/README.md:
--------------------------------------------------------------------------------
1 | # DNSMOS: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors
2 |
3 | Human subjective evaluation is the ”gold standard” to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the research community. One of the biggest use cases of these perceptual objective metrics is to evaluate noise suppression algorithms. DNSMOS generalizes well in challenging test conditions with a high correlation to human ratings in stack ranking noise suppression methods. More details can be found in [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf).
4 |
5 | ## Evaluation methodology:
6 | Use the **dnsmos_local.py** script.
7 | 1. To compute a personalized MOS score (where interfering speaker is penalized) provide the '-p' argument
8 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv -p
9 | 2. To compute a regular MOS score omit the '-p' argument.
10 | Ex: python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv
11 |
12 | ## Citation:
13 | If you have used the API for your research and development purpose, please cite the [DNSMOS paper](https://arxiv.org/pdf/2010.15258.pdf):
14 | ```BibTex
15 | @inproceedings{reddy2021dnsmos,
16 | title={Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
17 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
18 | booktitle={ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
19 | pages={6493--6497},
20 | year={2021},
21 | organization={IEEE}
22 | }
23 | ```
24 |
25 | If you used DNSMOS P.835 please cite the [DNSMOS P.835](https://arxiv.org/pdf/2110.01763.pdf) paper:
26 |
27 | ```BibTex
28 | @inproceedings{reddy2022dnsmos,
29 | title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
30 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
31 | booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
32 | year={2022},
33 | organization={IEEE}
34 | }
35 | ```
36 |
--------------------------------------------------------------------------------
/DNSMOS/dnsmos_local.py:
--------------------------------------------------------------------------------
1 | # Usage:
2 | # python dnsmos_local.py -t c:\temp\DNSChallenge4_Blindset -o DNSCh4_Blind.csv -p
3 | #
4 |
5 | import argparse
6 | import concurrent.futures
7 | import glob
8 | import os
9 |
10 | import librosa
11 | import numpy as np
12 | import numpy.polynomial.polynomial as poly
13 | import onnxruntime as ort
14 | import pandas as pd
15 | import soundfile as sf
16 | from requests import session
17 | from tqdm import tqdm
18 |
19 | SAMPLING_RATE = 16000
20 | INPUT_LENGTH = 9.01
21 |
22 | class ComputeScore:
23 | def __init__(self, primary_model_path, p808_model_path) -> None:
24 | self.onnx_sess = ort.InferenceSession(primary_model_path)
25 | self.p808_onnx_sess = ort.InferenceSession(p808_model_path)
26 |
27 | def audio_melspec(self, audio, n_mels=120, frame_size=320, hop_length=160, sr=16000, to_db=True):
28 | mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_fft=frame_size+1, hop_length=hop_length, n_mels=n_mels)
29 | if to_db:
30 | mel_spec = (librosa.power_to_db(mel_spec, ref=np.max)+40)/40
31 | return mel_spec.T
32 |
33 | def get_polyfit_val(self, sig, bak, ovr, is_personalized_MOS):
34 | if is_personalized_MOS:
35 | p_ovr = np.poly1d([-0.00533021, 0.005101 , 1.18058466, -0.11236046])
36 | p_sig = np.poly1d([-0.01019296, 0.02751166, 1.19576786, -0.24348726])
37 | p_bak = np.poly1d([-0.04976499, 0.44276479, -0.1644611 , 0.96883132])
38 | else:
39 | p_ovr = np.poly1d([-0.06766283, 1.11546468, 0.04602535])
40 | p_sig = np.poly1d([-0.08397278, 1.22083953, 0.0052439 ])
41 | p_bak = np.poly1d([-0.13166888, 1.60915514, -0.39604546])
42 |
43 | sig_poly = p_sig(sig)
44 | bak_poly = p_bak(bak)
45 | ovr_poly = p_ovr(ovr)
46 |
47 | return sig_poly, bak_poly, ovr_poly
48 |
49 | def __call__(self, fpath, sampling_rate, is_personalized_MOS):
50 | aud, input_fs = sf.read(fpath)
51 | fs = sampling_rate
52 | if input_fs != fs:
53 | audio = librosa.resample(aud, input_fs, fs)
54 | else:
55 | audio = aud
56 | actual_audio_len = len(audio)
57 | len_samples = int(INPUT_LENGTH*fs)
58 | while len(audio) < len_samples:
59 | audio = np.append(audio, audio)
60 |
61 | num_hops = int(np.floor(len(audio)/fs) - INPUT_LENGTH)+1
62 | hop_len_samples = fs
63 | predicted_mos_sig_seg_raw = []
64 | predicted_mos_bak_seg_raw = []
65 | predicted_mos_ovr_seg_raw = []
66 | predicted_mos_sig_seg = []
67 | predicted_mos_bak_seg = []
68 | predicted_mos_ovr_seg = []
69 | predicted_p808_mos = []
70 |
71 | for idx in range(num_hops):
72 | audio_seg = audio[int(idx*hop_len_samples) : int((idx+INPUT_LENGTH)*hop_len_samples)]
73 | if len(audio_seg) < len_samples:
74 | continue
75 |
76 | input_features = np.array(audio_seg).astype('float32')[np.newaxis,:]
77 | p808_input_features = np.array(self.audio_melspec(audio=audio_seg[:-160])).astype('float32')[np.newaxis, :, :]
78 | oi = {'input_1': input_features}
79 | p808_oi = {'input_1': p808_input_features}
80 | p808_mos = self.p808_onnx_sess.run(None, p808_oi)[0][0][0]
81 | mos_sig_raw,mos_bak_raw,mos_ovr_raw = self.onnx_sess.run(None, oi)[0][0]
82 | mos_sig,mos_bak,mos_ovr = self.get_polyfit_val(mos_sig_raw,mos_bak_raw,mos_ovr_raw,is_personalized_MOS)
83 | predicted_mos_sig_seg_raw.append(mos_sig_raw)
84 | predicted_mos_bak_seg_raw.append(mos_bak_raw)
85 | predicted_mos_ovr_seg_raw.append(mos_ovr_raw)
86 | predicted_mos_sig_seg.append(mos_sig)
87 | predicted_mos_bak_seg.append(mos_bak)
88 | predicted_mos_ovr_seg.append(mos_ovr)
89 | predicted_p808_mos.append(p808_mos)
90 |
91 | clip_dict = {'filename': fpath, 'len_in_sec': actual_audio_len/fs, 'sr':fs}
92 | clip_dict['num_hops'] = num_hops
93 | clip_dict['OVRL_raw'] = np.mean(predicted_mos_ovr_seg_raw)
94 | clip_dict['SIG_raw'] = np.mean(predicted_mos_sig_seg_raw)
95 | clip_dict['BAK_raw'] = np.mean(predicted_mos_bak_seg_raw)
96 | clip_dict['OVRL'] = np.mean(predicted_mos_ovr_seg)
97 | clip_dict['SIG'] = np.mean(predicted_mos_sig_seg)
98 | clip_dict['BAK'] = np.mean(predicted_mos_bak_seg)
99 | clip_dict['P808_MOS'] = np.mean(predicted_p808_mos)
100 | return clip_dict
101 |
102 | def main(args):
103 | models = glob.glob(os.path.join(args.testset_dir, "*"))
104 | audio_clips_list = []
105 | p808_model_path = os.path.join('DNSMOS', 'model_v8.onnx')
106 |
107 | if args.personalized_MOS:
108 | primary_model_path = os.path.join('pDNSMOS', 'sig_bak_ovr.onnx')
109 | else:
110 | primary_model_path = os.path.join('DNSMOS', 'sig_bak_ovr.onnx')
111 |
112 | compute_score = ComputeScore(primary_model_path, p808_model_path)
113 |
114 | rows = []
115 | clips = []
116 | clips = glob.glob(os.path.join(args.testset_dir, "*.wav"))
117 | is_personalized_eval = args.personalized_MOS
118 | desired_fs = SAMPLING_RATE
119 | for m in tqdm(models):
120 | max_recursion_depth = 10
121 | audio_path = os.path.join(args.testset_dir, m)
122 | audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav"))
123 | while len(audio_clips_list) == 0 and max_recursion_depth > 0:
124 | audio_path = os.path.join(audio_path, "**")
125 | audio_clips_list = glob.glob(os.path.join(audio_path, "*.wav"))
126 | max_recursion_depth -= 1
127 | clips.extend(audio_clips_list)
128 |
129 | with concurrent.futures.ThreadPoolExecutor() as executor:
130 | future_to_url = {executor.submit(compute_score, clip, desired_fs, is_personalized_eval): clip for clip in clips}
131 | for future in tqdm(concurrent.futures.as_completed(future_to_url)):
132 | clip = future_to_url[future]
133 | try:
134 | data = future.result()
135 | except Exception as exc:
136 | print('%r generated an exception: %s' % (clip, exc))
137 | else:
138 | rows.append(data)
139 |
140 | df = pd.DataFrame(rows)
141 | if args.csv_path:
142 | csv_path = args.csv_path
143 | df.to_csv(csv_path)
144 | else:
145 | print(df.describe())
146 |
147 | if __name__=="__main__":
148 | parser = argparse.ArgumentParser()
149 | parser.add_argument('-t', "--testset_dir", default='.',
150 | help='Path to the dir containing audio clips in .wav to be evaluated')
151 | parser.add_argument('-o', "--csv_path", default=None, help='Dir to the csv that saves the results')
152 | parser.add_argument('-p', "--personalized_MOS", action='store_true',
153 | help='Flag to indicate if personalized MOS score is needed or regular')
154 |
155 | args = parser.parse_args()
156 |
157 | main(args)
158 |
--------------------------------------------------------------------------------
/DNSMOS/pDNSMOS/sig_bak_ovr.onnx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/DNSMOS/pDNSMOS/sig_bak_ovr.onnx
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Attribution 4.0 International
2 |
3 | =======================================================================
4 |
5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
6 | does not provide legal services or legal advice. Distribution of
7 | Creative Commons public licenses does not create a lawyer-client or
8 | other relationship. Creative Commons makes its licenses and related
9 | information available on an "as-is" basis. Creative Commons gives no
10 | warranties regarding its licenses, any material licensed under their
11 | terms and conditions, or any related information. Creative Commons
12 | disclaims all liability for damages resulting from their use to the
13 | fullest extent possible.
14 |
15 | Using Creative Commons Public Licenses
16 |
17 | Creative Commons public licenses provide a standard set of terms and
18 | conditions that creators and other rights holders may use to share
19 | original works of authorship and other material subject to copyright
20 | and certain other rights specified in the public license below. The
21 | following considerations are for informational purposes only, are not
22 | exhaustive, and do not form part of our licenses.
23 |
24 | Considerations for licensors: Our public licenses are
25 | intended for use by those authorized to give the public
26 | permission to use material in ways otherwise restricted by
27 | copyright and certain other rights. Our licenses are
28 | irrevocable. Licensors should read and understand the terms
29 | and conditions of the license they choose before applying it.
30 | Licensors should also secure all rights necessary before
31 | applying our licenses so that the public can reuse the
32 | material as expected. Licensors should clearly mark any
33 | material not subject to the license. This includes other CC-
34 | licensed material, or material used under an exception or
35 | limitation to copyright. More considerations for licensors:
36 | wiki.creativecommons.org/Considerations_for_licensors
37 |
38 | Considerations for the public: By using one of our public
39 | licenses, a licensor grants the public permission to use the
40 | licensed material under specified terms and conditions. If
41 | the licensor's permission is not necessary for any reason--for
42 | example, because of any applicable exception or limitation to
43 | copyright--then that use is not regulated by the license. Our
44 | licenses grant only permissions under copyright and certain
45 | other rights that a licensor has authority to grant. Use of
46 | the licensed material may still be restricted for other
47 | reasons, including because others have copyright or other
48 | rights in the material. A licensor may make special requests,
49 | such as asking that all changes be marked or described.
50 | Although not required by our licenses, you are encouraged to
51 | respect those requests where reasonable. More_considerations
52 | for the public:
53 | wiki.creativecommons.org/Considerations_for_licensees
54 |
55 | =======================================================================
56 |
57 | Creative Commons Attribution 4.0 International Public License
58 |
59 | By exercising the Licensed Rights (defined below), You accept and agree
60 | to be bound by the terms and conditions of this Creative Commons
61 | Attribution 4.0 International Public License ("Public License"). To the
62 | extent this Public License may be interpreted as a contract, You are
63 | granted the Licensed Rights in consideration of Your acceptance of
64 | these terms and conditions, and the Licensor grants You such rights in
65 | consideration of benefits the Licensor receives from making the
66 | Licensed Material available under these terms and conditions.
67 |
68 |
69 | Section 1 -- Definitions.
70 |
71 | a. Adapted Material means material subject to Copyright and Similar
72 | Rights that is derived from or based upon the Licensed Material
73 | and in which the Licensed Material is translated, altered,
74 | arranged, transformed, or otherwise modified in a manner requiring
75 | permission under the Copyright and Similar Rights held by the
76 | Licensor. For purposes of this Public License, where the Licensed
77 | Material is a musical work, performance, or sound recording,
78 | Adapted Material is always produced where the Licensed Material is
79 | synched in timed relation with a moving image.
80 |
81 | b. Adapter's License means the license You apply to Your Copyright
82 | and Similar Rights in Your contributions to Adapted Material in
83 | accordance with the terms and conditions of this Public License.
84 |
85 | c. Copyright and Similar Rights means copyright and/or similar rights
86 | closely related to copyright including, without limitation,
87 | performance, broadcast, sound recording, and Sui Generis Database
88 | Rights, without regard to how the rights are labeled or
89 | categorized. For purposes of this Public License, the rights
90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar
91 | Rights.
92 |
93 | d. Effective Technological Measures means those measures that, in the
94 | absence of proper authority, may not be circumvented under laws
95 | fulfilling obligations under Article 11 of the WIPO Copyright
96 | Treaty adopted on December 20, 1996, and/or similar international
97 | agreements.
98 |
99 | e. Exceptions and Limitations means fair use, fair dealing, and/or
100 | any other exception or limitation to Copyright and Similar Rights
101 | that applies to Your use of the Licensed Material.
102 |
103 | f. Licensed Material means the artistic or literary work, database,
104 | or other material to which the Licensor applied this Public
105 | License.
106 |
107 | g. Licensed Rights means the rights granted to You subject to the
108 | terms and conditions of this Public License, which are limited to
109 | all Copyright and Similar Rights that apply to Your use of the
110 | Licensed Material and that the Licensor has authority to license.
111 |
112 | h. Licensor means the individual(s) or entity(ies) granting rights
113 | under this Public License.
114 |
115 | i. Share means to provide material to the public by any means or
116 | process that requires permission under the Licensed Rights, such
117 | as reproduction, public display, public performance, distribution,
118 | dissemination, communication, or importation, and to make material
119 | available to the public including in ways that members of the
120 | public may access the material from a place and at a time
121 | individually chosen by them.
122 |
123 | j. Sui Generis Database Rights means rights other than copyright
124 | resulting from Directive 96/9/EC of the European Parliament and of
125 | the Council of 11 March 1996 on the legal protection of databases,
126 | as amended and/or succeeded, as well as other essentially
127 | equivalent rights anywhere in the world.
128 |
129 | k. You means the individual or entity exercising the Licensed Rights
130 | under this Public License. Your has a corresponding meaning.
131 |
132 |
133 | Section 2 -- Scope.
134 |
135 | a. License grant.
136 |
137 | 1. Subject to the terms and conditions of this Public License,
138 | the Licensor hereby grants You a worldwide, royalty-free,
139 | non-sublicensable, non-exclusive, irrevocable license to
140 | exercise the Licensed Rights in the Licensed Material to:
141 |
142 | a. reproduce and Share the Licensed Material, in whole or
143 | in part; and
144 |
145 | b. produce, reproduce, and Share Adapted Material.
146 |
147 | 2. Exceptions and Limitations. For the avoidance of doubt, where
148 | Exceptions and Limitations apply to Your use, this Public
149 | License does not apply, and You do not need to comply with
150 | its terms and conditions.
151 |
152 | 3. Term. The term of this Public License is specified in Section
153 | 6(a).
154 |
155 | 4. Media and formats; technical modifications allowed. The
156 | Licensor authorizes You to exercise the Licensed Rights in
157 | all media and formats whether now known or hereafter created,
158 | and to make technical modifications necessary to do so. The
159 | Licensor waives and/or agrees not to assert any right or
160 | authority to forbid You from making technical modifications
161 | necessary to exercise the Licensed Rights, including
162 | technical modifications necessary to circumvent Effective
163 | Technological Measures. For purposes of this Public License,
164 | simply making modifications authorized by this Section 2(a)
165 | (4) never produces Adapted Material.
166 |
167 | 5. Downstream recipients.
168 |
169 | a. Offer from the Licensor -- Licensed Material. Every
170 | recipient of the Licensed Material automatically
171 | receives an offer from the Licensor to exercise the
172 | Licensed Rights under the terms and conditions of this
173 | Public License.
174 |
175 | b. No downstream restrictions. You may not offer or impose
176 | any additional or different terms or conditions on, or
177 | apply any Effective Technological Measures to, the
178 | Licensed Material if doing so restricts exercise of the
179 | Licensed Rights by any recipient of the Licensed
180 | Material.
181 |
182 | 6. No endorsement. Nothing in this Public License constitutes or
183 | may be construed as permission to assert or imply that You
184 | are, or that Your use of the Licensed Material is, connected
185 | with, or sponsored, endorsed, or granted official status by,
186 | the Licensor or others designated to receive attribution as
187 | provided in Section 3(a)(1)(A)(i).
188 |
189 | b. Other rights.
190 |
191 | 1. Moral rights, such as the right of integrity, are not
192 | licensed under this Public License, nor are publicity,
193 | privacy, and/or other similar personality rights; however, to
194 | the extent possible, the Licensor waives and/or agrees not to
195 | assert any such rights held by the Licensor to the limited
196 | extent necessary to allow You to exercise the Licensed
197 | Rights, but not otherwise.
198 |
199 | 2. Patent and trademark rights are not licensed under this
200 | Public License.
201 |
202 | 3. To the extent possible, the Licensor waives any right to
203 | collect royalties from You for the exercise of the Licensed
204 | Rights, whether directly or through a collecting society
205 | under any voluntary or waivable statutory or compulsory
206 | licensing scheme. In all other cases the Licensor expressly
207 | reserves any right to collect such royalties.
208 |
209 |
210 | Section 3 -- License Conditions.
211 |
212 | Your exercise of the Licensed Rights is expressly made subject to the
213 | following conditions.
214 |
215 | a. Attribution.
216 |
217 | 1. If You Share the Licensed Material (including in modified
218 | form), You must:
219 |
220 | a. retain the following if it is supplied by the Licensor
221 | with the Licensed Material:
222 |
223 | i. identification of the creator(s) of the Licensed
224 | Material and any others designated to receive
225 | attribution, in any reasonable manner requested by
226 | the Licensor (including by pseudonym if
227 | designated);
228 |
229 | ii. a copyright notice;
230 |
231 | iii. a notice that refers to this Public License;
232 |
233 | iv. a notice that refers to the disclaimer of
234 | warranties;
235 |
236 | v. a URI or hyperlink to the Licensed Material to the
237 | extent reasonably practicable;
238 |
239 | b. indicate if You modified the Licensed Material and
240 | retain an indication of any previous modifications; and
241 |
242 | c. indicate the Licensed Material is licensed under this
243 | Public License, and include the text of, or the URI or
244 | hyperlink to, this Public License.
245 |
246 | 2. You may satisfy the conditions in Section 3(a)(1) in any
247 | reasonable manner based on the medium, means, and context in
248 | which You Share the Licensed Material. For example, it may be
249 | reasonable to satisfy the conditions by providing a URI or
250 | hyperlink to a resource that includes the required
251 | information.
252 |
253 | 3. If requested by the Licensor, You must remove any of the
254 | information required by Section 3(a)(1)(A) to the extent
255 | reasonably practicable.
256 |
257 | 4. If You Share Adapted Material You produce, the Adapter's
258 | License You apply must not prevent recipients of the Adapted
259 | Material from complying with this Public License.
260 |
261 |
262 | Section 4 -- Sui Generis Database Rights.
263 |
264 | Where the Licensed Rights include Sui Generis Database Rights that
265 | apply to Your use of the Licensed Material:
266 |
267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right
268 | to extract, reuse, reproduce, and Share all or a substantial
269 | portion of the contents of the database;
270 |
271 | b. if You include all or a substantial portion of the database
272 | contents in a database in which You have Sui Generis Database
273 | Rights, then the database in which You have Sui Generis Database
274 | Rights (but not its individual contents) is Adapted Material; and
275 |
276 | c. You must comply with the conditions in Section 3(a) if You Share
277 | all or a substantial portion of the contents of the database.
278 |
279 | For the avoidance of doubt, this Section 4 supplements and does not
280 | replace Your obligations under this Public License where the Licensed
281 | Rights include other Copyright and Similar Rights.
282 |
283 |
284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
285 |
286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
296 |
297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
306 |
307 | c. The disclaimer of warranties and limitation of liability provided
308 | above shall be interpreted in a manner that, to the extent
309 | possible, most closely approximates an absolute disclaimer and
310 | waiver of all liability.
311 |
312 |
313 | Section 6 -- Term and Termination.
314 |
315 | a. This Public License applies for the term of the Copyright and
316 | Similar Rights licensed here. However, if You fail to comply with
317 | this Public License, then Your rights under this Public License
318 | terminate automatically.
319 |
320 | b. Where Your right to use the Licensed Material has terminated under
321 | Section 6(a), it reinstates:
322 |
323 | 1. automatically as of the date the violation is cured, provided
324 | it is cured within 30 days of Your discovery of the
325 | violation; or
326 |
327 | 2. upon express reinstatement by the Licensor.
328 |
329 | For the avoidance of doubt, this Section 6(b) does not affect any
330 | right the Licensor may have to seek remedies for Your violations
331 | of this Public License.
332 |
333 | c. For the avoidance of doubt, the Licensor may also offer the
334 | Licensed Material under separate terms or conditions or stop
335 | distributing the Licensed Material at any time; however, doing so
336 | will not terminate this Public License.
337 |
338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
339 | License.
340 |
341 |
342 | Section 7 -- Other Terms and Conditions.
343 |
344 | a. The Licensor shall not be bound by any additional or different
345 | terms or conditions communicated by You unless expressly agreed.
346 |
347 | b. Any arrangements, understandings, or agreements regarding the
348 | Licensed Material not stated herein are separate from and
349 | independent of the terms and conditions of this Public License.
350 |
351 |
352 | Section 8 -- Interpretation.
353 |
354 | a. For the avoidance of doubt, this Public License does not, and
355 | shall not be interpreted to, reduce, limit, restrict, or impose
356 | conditions on any use of the Licensed Material that could lawfully
357 | be made without permission under this Public License.
358 |
359 | b. To the extent possible, if any provision of this Public License is
360 | deemed unenforceable, it shall be automatically reformed to the
361 | minimum extent necessary to make it enforceable. If the provision
362 | cannot be reformed, it shall be severed from this Public License
363 | without affecting the enforceability of the remaining terms and
364 | conditions.
365 |
366 | c. No term or condition of this Public License will be waived and no
367 | failure to comply consented to unless expressly agreed to by the
368 | Licensor.
369 |
370 | d. Nothing in this Public License constitutes or may be interpreted
371 | as a limitation upon, or waiver of, any privileges and immunities
372 | that apply to the Licensor or You, including from the legal
373 | processes of any jurisdiction or authority.
374 |
375 |
376 | =======================================================================
377 |
378 | Creative Commons is not a party to its public
379 | licenses. Notwithstanding, Creative Commons may elect to apply one of
380 | its public licenses to material it publishes and in those instances
381 | will be considered the “Licensor.” The text of the Creative Commons
382 | public licenses is dedicated to the public domain under the CC0 Public
383 | Domain Dedication. Except for the limited purpose of indicating that
384 | material is shared under a Creative Commons public license or as
385 | otherwise permitted by the Creative Commons policies published at
386 | creativecommons.org/policies, Creative Commons does not authorize the
387 | use of the trademark "Creative Commons" or any other trademark or logo
388 | of Creative Commons without its prior written consent including,
389 | without limitation, in connection with any unauthorized modifications
390 | to any of its public licenses or any other arrangements,
391 | understandings, or agreements concerning use of licensed material. For
392 | the avoidance of doubt, this paragraph does not form part of the
393 | public licenses.
394 |
395 | Creative Commons may be contacted at creativecommons.org.
--------------------------------------------------------------------------------
/LICENSE-CODE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
22 |
--------------------------------------------------------------------------------
/README-DNS3.md:
--------------------------------------------------------------------------------
1 |
2 | # Deep Noise Suppression (DNS) Challenge 3 - INTERSPEECH 2021
3 |
4 | **NOTE:** This README describes the **PAST** DNS Challenge!
5 |
6 | The data for it is still available, and is described below. If you are interested in the latest DNS
7 | Challenge, please refer to the main [README.md](README.md) file.
8 |
9 | ## In this repository
10 |
11 | This repository contains the datasets and scripts required for INTERSPEECH 2021 DNS Challenge, AKA
12 | DNS Challenge 3, or DNS3. For more details about the challenge, please see our
13 | [paper](https://arxiv.org/pdf/2101.01902.pdf) and the challenge
14 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/).
15 | For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808).
16 |
17 | ## Details
18 |
19 | * The **datasets** directory is a placeholder for the wideband datasets. That is, our data
20 | downloader script by default will place the downloader audio data here. After the download, this
21 | directory will contain clean speech, noise, and room impulse responses required for creating the
22 | training data for wideband scenario. The script will also download here the test set that
23 | participants can use during the development stages.
24 | * The **datasets_fullband** directory is a placeholder for the fullband audio data. The downloader
25 | script will download here the datasets that contain clean speech and noise audio clips required
26 | for creating training data for fullband scenario.
27 | * The **NSNet2-baseline** directory contains the inference scripts and the ONNX model for the
28 | baseline Speech Enhancement method for wideband.
29 | * **download-dns-challenge-3.sh** - this is the script to download the data. By default, the data
30 | will be placed into `datasets/` and `datasets_fullband/` directories. Please take a look at the
31 | script and uncomment the perferred download method. Unmodified, the script performs a dry
32 | run and retrieves only the HTTP headers for each archive.
33 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for
34 | training purposes.
35 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are
36 | required to accurately specify different parameters and provide the right paths to the datasets
37 | required to synthesize noisy speech.
38 | * **audiolib.py** - contains modules required to synthesize datasets.
39 | * **utils.py** - contains some utility functions required to synthesize the data.
40 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data.
41 | * **requirements.txt** - contains all the libraries required for synthesizing the data.
42 |
43 | ## Datasets
44 |
45 | The default directory structure and the sizes of the datasets available for DNS Challenge are:
46 |
47 | ```
48 | datasets 229G
49 | ├── clean 204G
50 | │ ├── emotional_speech 403M
51 | │ ├── french_data 21G
52 | │ ├── german_speech 66G
53 | │ ├── italian_speech 14G
54 | │ ├── mandarin_speech 21G
55 | │ ├── read_speech 61G
56 | │ ├── russian_speech 5.1G
57 | │ ├── singing_voice 979M
58 | │ └── spanish_speech 17G
59 | ├── dev_testset 211M
60 | ├── impulse_responses 4.3G
61 | │ ├── SLR26 2.1G
62 | │ └── SLR28 2.3G
63 | └── noise 20G
64 | ```
65 |
66 | And, for the fullband data,
67 | ```
68 | datasets_fullband 600G
69 | ├── clean_fullband 542G
70 | │ ├── VocalSet_48kHz_mono 974M
71 | │ ├── emotional_speech 1.2G
72 | │ ├── french_data 62G
73 | │ ├── german_speech 194G
74 | │ ├── italian_speech 42G
75 | │ ├── read_speech 182G
76 | │ ├── russian_speech 12G
77 | │ └── spanish_speech 50G
78 | ├── dev_testset_fullband 630M
79 | └── noise_fullband 58G
80 | ```
81 |
82 | ## Code prerequisites
83 | - Python 3.6 and above
84 | - Python libraries: soundfile, librosa
85 |
86 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the
87 | `download-dns-challenge-3.sh` script in this repo to download the data.
88 |
89 | ## Usage:
90 |
91 | 1. Install Python libraries
92 | ```bash
93 | pip3 install soundfile librosa
94 | ```
95 | 2. Clone the repository.
96 | ```bash
97 | git clone https://github.com/microsoft/DNS-Challenge
98 | ```
99 |
100 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and
101 | include the paths to clean speech, noise and impulse response related csv files. Also, specify
102 | the paths to the destination directories and store the logs.
103 |
104 | 4. Create dataset
105 | ```bash
106 | python3 noisyspeech_synthesizer_singleprocess.py
107 | ```
108 |
109 | ## Citation:
110 | If you use this dataset in a publication please cite the following paper:
111 |
112 | ```BibTex
113 | @inproceedings{reddy2021interspeech,
114 | title={INTERSPEECH 2021 Deep Noise Suppression Challenge},
115 | author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
116 | booktitle={INTERSPEECH},
117 | year={2021}
118 | }
119 | ```
120 |
121 | The baseline NSNet noise suppression:
122 | ```BibTex
123 | @inproceedings{9054254,
124 | author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}},
125 | booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
126 | Speech and Signal Processing (ICASSP)},
127 | title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement},
128 | year={2020}, volume={}, number={}, pages={871-875},}
129 | ```
130 |
131 | ```BibTex
132 | @misc{braun2020data,
133 | title={Data augmentation and loss normalization for deep noise suppression},
134 | author={Sebastian Braun and Ivan Tashev},
135 | year={2020},
136 | eprint={2008.06412},
137 | archivePrefix={arXiv},
138 | primaryClass={eess.AS}
139 | }
140 | ```
141 |
142 | The P.835 test framework:
143 | ```BibTex
144 | @inproceedings{naderi2021crowdsourcing,
145 | title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing},
146 | author={Naderi, Babak and Cutler, Ross},
147 | booktitle={INTERSPEECH},
148 | year={2021}
149 | }
150 | ```
151 |
152 | DNSMOS API:
153 | ```BibTex
154 | @inproceedings{reddy2020dnsmos,
155 | title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors},
156 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
157 | booktitle={ICASSP},
158 | year={2020}
159 | }
160 | ```
161 |
162 | # Contributing
163 |
164 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
165 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
166 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
167 |
168 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a
169 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
170 | provided by the bot. You will only need to do this once across all repos using our CLA.
171 |
172 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
173 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
174 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
175 |
176 | # Legal Notices
177 |
178 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content
179 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
180 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
181 | [LICENSE-CODE](LICENSE-CODE) file.
182 |
183 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the
184 | documentation may be either trademarks or registered trademarks of Microsoft in the United States
185 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft
186 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at
187 | http://go.microsoft.com/fwlink/?LinkID=254653.
188 |
189 | Privacy information can be found at https://privacy.microsoft.com/en-us/
190 |
191 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
192 | or trademarks, whether by implication, estoppel or otherwise.
193 |
194 |
195 | ## Dataset licenses
196 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
197 |
198 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.
199 |
200 | The datasets used in this project are licensed as follows:
201 | 1. Clean speech:
202 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/
203 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/
204 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y
205 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International
206 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)
207 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/
208 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
209 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.
210 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0.
211 | http://opendatacommons.org/licenses/by/1.0/
212 |
213 | 2. Noise:
214 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/
215 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/
216 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA
217 |
218 | 3. RIR datasets: OpenSLR26 and OpenSLR28:
219 | * http://www.openslr.org/26/
220 | * http://www.openslr.org/28/
221 | * License: Apache 2.0
222 |
223 | ## Code license
224 | MIT License
225 |
226 | Copyright (c) Microsoft Corporation.
227 |
228 | Permission is hereby granted, free of charge, to any person obtaining a copy
229 | of this software and associated documentation files (the "Software"), to deal
230 | in the Software without restriction, including without limitation the rights
231 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
232 | copies of the Software, and to permit persons to whom the Software is
233 | furnished to do so, subject to the following conditions:
234 |
235 | The above copyright notice and this permission notice shall be included in all
236 | copies or substantial portions of the Software.
237 |
238 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
239 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
240 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
241 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
242 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
243 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
244 | SOFTWARE
245 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ICASSP 2023 Deep Noise Suppression Challenge
2 | Website: https://aka.ms/dns-challenge
3 | Git Repo: https://github.com/microsoft/DNS-Challenge
4 | Challenge Paper:
5 |
6 | ## Important features of this challenge
7 | 1. Along with noise suppression, it includes de-reverberation and suppression of interfering talkers for headset and speakerphone scenarios.
8 | 2. The challenge has two tracks: (i) Headset (wired/wireless headphone, earbuds such as airpods etc.) speech enhancement; (ii) Non-headset (speakerphone, built-in mic in laptop/desktop/mobile phone/other meeting devices etc.) speech enhancement.
9 | 3. This challenge adopts the ITU-T P.835 subjective test framework to measure speech quality (SIG), background noise quality (BAK), and overall audio quality (OVRL). We modified the ITU-T P.835 to make it reliable for test clips with interfering (undesired neighboring) talkers. Along with P.835 scores, Word Accuracy (WAcc) is used to measure the performance of models.
10 | 4. Please NOTE that the intellectual property (IP) is not transferred to the challenge organizers, i.e., if code is shared/submitted, the participants remain the owners of their code (when the code is made publicly available, an appropriate license should be added).
11 | 5. There are new requirements for model related latency. Please check all requirements listed at https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/
12 |
13 | ## Baseline Speaker Embeddings
14 | This challenge adopted pretrained ECAPA-TDNN model available in SpeechBrain as baseline speaker embeddings models, available at https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb. Participants can use any other publically available speaker embeddings model or develop their own speaker embedding extractor. Participants are encourage to explore RawNet3 models available at https://github.com/jungjee/RawNet
15 |
16 | Previous DNS Challenge used RawNet2 speaker embeddings. So far, impact of different speaker embeddings for personalized speech enhancements is not studied in sufficient depth.
17 |
18 | # Install SpeechBrain with below command:
19 | pip install speechbrain
20 |
21 | #Compute Speaker Embeddings for your wav file with below command:
22 |
23 | import torchaudio
24 | from speechbrain.pretrained import EncoderClassifier
25 | classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb")
26 | signal, fs =torchaudio.load('tests/samples/ASR/spk1_snt1.wav')
27 | embeddings = classifier.encode_batch(signal)
28 |
29 | ## In this repository
30 |
31 | This repository contains the datasets and scripts required for 5th DNS Challenge at ICASSP 2023, aka DNS
32 | Challenge 5, or simply **DNS5**. For more details about the challenge, please see our
33 | [website](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2023/) and [paper](docs/ICASSP2023_5th_DNS_Challenge.pdf). For more details on the testing framework, please visit [P.835](https://github.com/microsoft/P.808).
34 |
35 | ## Details
36 |
37 | * The **datasets_fullband** folder is a placeholder for the datasets. That is, our data downloader
38 | script by default will place the downloaded audio data there. After the download, it will contain
39 | clean speech, noise, and room impulse responses required for creating the training data.
40 |
41 | * The **Baseline** directory contains the enhanced clips from dev testset for both tracks.
42 |
43 | * **download-dns-challenge-5-headset-training.sh** - this is the script to download the data for headset (Track 1). By default, the data will be placed into the `./datasets_fullband/` folder. Please take a look at the script and **uncomment** the perferred download method._ Unmodified, the script performs a dry run and retrieves only the HTTP headers for each archive.
44 |
45 | * **download-dns-challenge-5-speakerphone-training.sh** - this is the script to download the data for speakerphone (Track 2).
46 |
47 | * **noisyspeech_synthesizer_singleprocess.py** - is used to synthesize noisy-clean speech pairs for
48 | training purposes.
49 |
50 | * **noisyspeech_synthesizer.cfg** - is the configuration file used to synthesize the data. Users are
51 | required to accurately specify different parameters and provide the right paths to the datasets required to synthesize noisy speech.
52 |
53 | * **audiolib.py** - contains modules required to synthesize datasets.
54 | * **utils.py** - contains some utility functions required to synthesize the data.
55 | * **unit_tests_synthesizer.py** - contains the unit tests to ensure sanity of the data.
56 | * **requirements.txt** - contains all the libraries required for synthesizing the data.
57 |
58 | ## Datasets
59 | **V5_dev_testset**: directory containing dev testsets for both tracks. Each testclip has 10s duration and the corresponding enrollment clips with 30s duration.
60 |
61 | **BLIND testset**:
62 |
63 | ## WAcc script
64 | https://github.com/microsoft/DNS-Challenge/tree/master/WAcc
65 |
66 | ## Wacc ground-truth transcript
67 | Dev testset: available only for speakerphone track, see v5_dev_testset directory. For headset track, we are providing ASR output and list of prompts read during recording of testclips. Participants can help in correcting ASR output to generate the ground-truth transcripts.
68 | Blind testset:
69 |
70 | ### Data info
71 |
72 | The default directory structure and the sizes of the datasets of the 5th DNS
73 | Challenge are:
74 |
75 | ```
76 | datasets_fullband
77 | +-- dev_testset
78 | +-- impulse_responses 5.9G
79 | +-- noise_fullband 58G
80 | \-- clean_fullband 827G
81 | +-- emotional_speech 2.4G
82 | +-- french_speech 62G
83 | +-- german_speech 319G
84 | +-- italian_speech 42G
85 | +-- read_speech 299G
86 | +-- russian_speech 12G
87 | +-- spanish_speech 65G
88 | +-- vctk_wav48_silence_trimmed 27G
89 | \-- VocalSet_48kHz_mono 974M
90 | ```
91 |
92 | In all, you will need about 1TB to store the _unpacked_ data. Archived, the same data takes about
93 | 550GB total.
94 |
95 | ### Headset DNS track
96 | ### Data checksums
97 |
98 | A CSV file containing file sizes and SHA1 checksums for audio clips in both Real-time *and*
99 | Personalized DNS datasets is available at:
100 | [dns5-datasets-files-sha1.csv.bz2](https://dns4public.blob.core.windows.net/dns4archive/dns5-datasets-files-sha1.csv.bz2).
101 | The archive is 41.3MB in size and can be read in Python like this:
102 | ```python
103 | import pandas as pd
104 |
105 | sha1sums = pd.read_csv("dns5-datasets-files-sha1.csv.bz2", names=["size", "sha1", "path"])
106 | ```
107 |
108 | ## Code prerequisites
109 | - Python 3.6 and above
110 | - Python libraries: soundfile, librosa
111 |
112 | **NOTE:** git LFS is *no longer required* for DNS Challenge. Please use the
113 | `download-dns-challenge-5*.sh` scripts in this repo to download the data.
114 |
115 | ## Usage:
116 |
117 | 1. Install Python libraries
118 | ```bash
119 | pip3 install soundfile librosa
120 | ```
121 | 2. Clone the repository.
122 | ```bash
123 | git clone https://github.com/microsoft/DNS-Challenge
124 | ```
125 |
126 | 3. Edit **noisyspeech_synthesizer.cfg** to specify the required parameters described in the file and
127 | include the paths to clean speech, noise and impulse response related csv files. Also, specify
128 | the paths to the destination directories and store the logs.
129 |
130 | 4. Create dataset
131 | ```bash
132 | python3 noisyspeech_synthesizer_singleprocess.py
133 | ```
134 |
135 | ## Citation:
136 | If you use this dataset in a publication please cite the following paper:
137 |
138 | ```BibTex
139 | @inproceedings{dubey2023icassp,
140 | title={ICASSP 2023 Deep Noise Suppression Challenge},
141 | author={
142 | Dubey, Harishchandra and Aazami, Ashkan and Gopal, Vishak and Naderi, Babak and Braun, Sebastian and Cutler, Ross and Gamper, Hannes and Golestaneh, Mehrsa and Aichner, Robert},
143 | booktitle={ICASSP},
144 | year={2023}
145 | }
146 | ```
147 |
148 | The previous challenges were:
149 | ```BibTex
150 | @inproceedings{dubey2022icassp,
151 | title={ICASSP 2022 Deep Noise Suppression Challenge},
152 | author={Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Matusevych, Sergiy and Braun, Sebastian and Eskimez, Emre Sefik and Thakker, Manthan and Yoshioka, Takuya and Gamper, Hannes and Aichner, Robert},
153 | booktitle={ICASSP},
154 | year={2022}
155 | }
156 |
157 | @inproceedings{reddy2021interspeech,
158 | title={INTERSPEECH 2021 Deep Noise Suppression Challenge},
159 | author={Reddy, Chandan KA and Dubey, Harishchandra and Koishida, Kazuhito and Nair, Arun and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
160 | booktitle={INTERSPEECH},
161 | year={2021}
162 | }
163 | ```
164 | ```BibTex
165 | @inproceedings{reddy2021icassp,
166 | title={ICASSP 2021 deep noise suppression challenge},
167 | author={Reddy, Chandan KA and Dubey, Harishchandra and Gopal, Vishak and Cutler, Ross and Braun, Sebastian and Gamper, Hannes and Aichner, Robert and Srinivasan, Sriram},
168 | booktitle={ICASSP},
169 | year={2021},
170 | }
171 | ```
172 | ```BibTex
173 | @inproceedings{reddy2020interspeech,
174 | title={The INTERSPEECH 2020 deep noise suppression challenge: Datasets, subjective testing framework, and challenge results},
175 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross and Beyrami, Ebrahim and Cheng, Roger and Dubey, Harishchandra and Matusevych, Sergiy and Aichner, Robert and Aazami, Ashkan and Braun, Sebastian and others},
176 | booktitle={INTERSPEECH},
177 | year={2020}
178 | }
179 | ```
180 |
181 | The baseline NSNet noise suppression:
182 | ```BibTex
183 | @inproceedings{9054254,
184 | author={Y. {Xia} and S. {Braun} and C. K. A. {Reddy} and H. {Dubey} and R. {Cutler} and I. {Tashev}},
185 | booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics,
186 | Speech and Signal Processing (ICASSP)},
187 | title={Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement},
188 | year={2020}, volume={}, number={}, pages={871-875},}
189 | ```
190 |
191 | ```BibTex
192 | @misc{braun2020data,
193 | title={Data augmentation and loss normalization for deep noise suppression},
194 | author={Sebastian Braun and Ivan Tashev},
195 | year={2020},
196 | eprint={2008.06412},
197 | archivePrefix={arXiv},
198 | primaryClass={eess.AS}
199 | }
200 | ```
201 |
202 | The P.835 test framework:
203 | ```BibTex
204 | @inproceedings{naderi2021crowdsourcing,
205 | title={Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing},
206 | author={Naderi, Babak and Cutler, Ross},
207 | booktitle={INTERSPEECH},
208 | year={2021}
209 | }
210 | ```
211 |
212 | DNSMOS API:
213 | ```BibTex
214 | @inproceedings{reddy2021dnsmos,
215 | title={DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors},
216 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
217 | booktitle={ICASSP},
218 | year={2021}
219 | }
220 | ```
221 |
222 | ```BibTex
223 | @inproceedings{reddy2022dnsmos,
224 | title={DNSMOS P.835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors},
225 | author={Reddy, Chandan KA and Gopal, Vishak and Cutler, Ross},
226 | booktitle={ICASSP},
227 | year={2022}
228 | }
229 | ```
230 |
231 | # Contributing
232 |
233 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
234 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
235 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
236 |
237 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide a
238 | CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
239 | provided by the bot. You will only need to do this once across all repos using our CLA.
240 |
241 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
242 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
243 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
244 |
245 | # Legal Notices
246 |
247 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content
248 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
249 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
250 | [LICENSE-CODE](LICENSE-CODE) file.
251 |
252 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the
253 | documentation may be either trademarks or registered trademarks of Microsoft in the United States
254 | and/or other countries. The licenses for this project do not grant you rights to use any Microsoft
255 | names, logos, or trademarks. Microsoft's general trademark guidelines can be found at
256 | http://go.microsoft.com/fwlink/?LinkID=254653.
257 |
258 | Privacy information can be found at https://privacy.microsoft.com/en-us/
259 |
260 | Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents,
261 | or trademarks, whether by implication, estoppel or otherwise.
262 |
263 |
264 | ## Dataset licenses
265 | MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
266 |
267 | The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.
268 |
269 | The datasets used in this project are licensed as follows:
270 | 1. Clean speech:
271 | * https://librivox.org/; License: https://librivox.org/pages/public-domain/
272 | * PTDB-TUG: Pitch Tracking Database from Graz University of Technology https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html; License: http://opendatacommons.org/licenses/odbl/1.0/
273 | * Edinburgh 56 speaker dataset: https://datashare.is.ed.ac.uk/handle/10283/2791; License: https://datashare.is.ed.ac.uk/bitstream/handle/10283/2791/license_text?sequence=11&isAllowed=y
274 | * VocalSet: A Singing Voice Dataset https://zenodo.org/record/1193957#.X1hkxYtlCHs; License: Creative Commons Attribution 4.0 International
275 | * Emotion data corpus: CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)
276 | https://github.com/CheyneyComputerScience/CREMA-D; License: http://opendatacommons.org/licenses/dbcl/1.0/
277 | * The VoxCeleb2 Dataset http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html; License: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/
278 | The VoxCeleb dataset is available to download for commercial/research purposes under a Creative Commons Attribution 4.0 International License. The copyright remains with the original owners of the video. A complete version of the license can be found here.
279 | * VCTK Dataset: https://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html; License: This corpus is licensed under Open Data Commons Attribution License (ODC-By) v1.0.
280 | http://opendatacommons.org/licenses/by/1.0/
281 |
282 | 2. Noise:
283 | * Audioset: https://research.google.com/audioset/index.html; License: https://creativecommons.org/licenses/by/4.0/
284 | * Freesound: https://freesound.org/ Only files with CC0 licenses were selected; License: https://creativecommons.org/publicdomain/zero/1.0/
285 | * Demand: https://zenodo.org/record/1227121#.XRKKxYhKiUk; License: https://creativecommons.org/licenses/by-sa/3.0/deed.en_CA
286 |
287 | 3. RIR datasets: OpenSLR26 and OpenSLR28:
288 | * http://www.openslr.org/26/
289 | * http://www.openslr.org/28/
290 | * License: Apache 2.0
291 |
292 | ## Code license
293 | MIT License
294 |
295 | Copyright (c) Microsoft Corporation.
296 |
297 | Permission is hereby granted, free of charge, to any person obtaining a copy
298 | of this software and associated documentation files (the "Software"), to deal
299 | in the Software without restriction, including without limitation the rights
300 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
301 | copies of the Software, and to permit persons to whom the Software is
302 | furnished to do so, subject to the following conditions:
303 |
304 | The above copyright notice and this permission notice shall be included in all
305 | copies or substantial portions of the Software.
306 |
307 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
308 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
309 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
310 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
311 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
312 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
313 | SOFTWARE
314 |
--------------------------------------------------------------------------------
/SECURITY.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | ## Security
4 |
5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
6 |
7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)) of a security vulnerability, please report it to us as described below.
8 |
9 | ## Reporting Security Issues
10 |
11 | **Please do not report security vulnerabilities through public GitHub issues.**
12 |
13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
14 |
15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
16 |
17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
18 |
19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
20 |
21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
22 | * Full paths of source file(s) related to the manifestation of the issue
23 | * The location of the affected source code (tag/branch/commit or direct URL)
24 | * Any special configuration required to reproduce the issue
25 | * Step-by-step instructions to reproduce the issue
26 | * Proof-of-concept or exploit code (if possible)
27 | * Impact of the issue, including how an attacker might exploit the issue
28 |
29 | This information will help us triage your report more quickly.
30 |
31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.
32 |
33 | ## Preferred Languages
34 |
35 | We prefer all communications to be in English.
36 |
37 | ## Policy
38 |
39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
40 |
41 |
42 |
--------------------------------------------------------------------------------
/V5_DNS_Challenge_FinalResults.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/V5_DNS_Challenge_FinalResults.pdf
--------------------------------------------------------------------------------
/WAcc/WAcc.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import glob
3 | import os
4 |
5 | import librosa
6 | import numpy as np
7 | import pandas
8 | import pandas as pd
9 | import requests
10 | import soundfile as sf
11 |
12 | WACC_SERVICE_URL = 'https://wacc.azurewebsites.net/api/TriggerEvaluation?code=K2XN7ouruRN/2k1HNyS79ET39rEMZ9jOOCnFtodPDj42WJFjG9LWXg=='
13 | SUPPORTED_SAMPLING_RATE = 16000
14 | TRANSCRIPTIONS_FILE = 'DNSChallenge4_devtest.tsv'
15 |
16 | def main(args):
17 | audio_clips_list = glob.glob(os.path.join(args.testset_dir, "*.wav"))
18 | transcriptions_df = pd.read_csv(TRANSCRIPTIONS_FILE, sep="\t")
19 | scores = []
20 | for fpath in audio_clips_list:
21 | if os.path.basename(fpath) not in transcriptions_df['filename'].unique():
22 | continue
23 | original_audio, fs = sf.read(fpath)
24 | if fs != SUPPORTED_SAMPLING_RATE:
25 | print('Only sampling rate of 16000 is supported as of now so resampling audio')
26 | audio = librosa.core.resample(original_audio, fs, SUPPORTED_SAMPLING_RATE)
27 | sf.write(fpath, audio, SUPPORTED_SAMPLING_RATE)
28 |
29 | try:
30 | with open(fpath, 'rb') as f:
31 | resp = requests.post(WACC_SERVICE_URL, files={'audiodata':f})
32 | wacc = resp.json()
33 | except:
34 | print('Error occured during scoring')
35 | print('response is ', resp)
36 | sf.write(fpath, original_audio, fs)
37 | score_dict = {'file_name': os.path.basename(fpath), 'wacc': wacc}
38 | scores.append(score_dict)
39 |
40 | df = pd.DataFrame(scores)
41 | print('Mean WAcc for the files is ', np.mean(df['wacc']))
42 |
43 | if args.score_file:
44 | df.to_csv(args.score_file)
45 |
46 | if __name__=="__main__":
47 | parser = argparse.ArgumentParser()
48 | parser.add_argument("--testset_dir", required=True,
49 | help='Path to the dir containing audio clips to be evaluated')
50 | parser.add_argument('--score_file', help='If you want the scores in a CSV file provide the full path')
51 |
52 | args = parser.parse_args()
53 | main(args)
54 |
--------------------------------------------------------------------------------
/audiolib.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | @author: chkarada
4 | """
5 | import os
6 | import numpy as np
7 | import soundfile as sf
8 | import subprocess
9 | import glob
10 | import librosa
11 | import random
12 | import tempfile
13 |
14 | EPS = np.finfo(float).eps
15 | np.random.seed(0)
16 |
17 | def is_clipped(audio, clipping_threshold=0.99):
18 | return any(abs(audio) > clipping_threshold)
19 |
20 | def normalize(audio, target_level=-25):
21 | '''Normalize the signal to the target level'''
22 | rms = (audio ** 2).mean() ** 0.5
23 | scalar = 10 ** (target_level / 20) / (rms+EPS)
24 | audio = audio * scalar
25 | return audio
26 |
27 | def normalize_segmental_rms(audio, rms, target_level=-25):
28 | '''Normalize the signal to the target level
29 | based on segmental RMS'''
30 | scalar = 10 ** (target_level / 20) / (rms+EPS)
31 | audio = audio * scalar
32 | return audio
33 |
34 | def audioread(path, norm=False, start=0, stop=None, target_level=-25):
35 | '''Function to read audio'''
36 |
37 | path = os.path.abspath(path)
38 | if not os.path.exists(path):
39 | raise ValueError("[{}] does not exist!".format(path))
40 | try:
41 | audio, sample_rate = sf.read(path, start=start, stop=stop)
42 | except RuntimeError: # fix for sph pcm-embedded shortened v2
43 | print('WARNING: Audio type not supported')
44 | return (None, None)
45 |
46 | if len(audio.shape) == 1: # mono
47 | if norm:
48 | rms = (audio ** 2).mean() ** 0.5
49 | scalar = 10 ** (target_level / 20) / (rms+EPS)
50 | audio = audio * scalar
51 | else: # multi-channel
52 | audio = audio.T
53 | audio = audio.sum(axis=0)/audio.shape[0]
54 | if norm:
55 | audio = normalize(audio, target_level)
56 |
57 | return audio, sample_rate
58 |
59 |
60 | def audiowrite(destpath, audio, sample_rate=16000, norm=False, target_level=-25, \
61 | clipping_threshold=0.99, clip_test=False):
62 | '''Function to write audio'''
63 |
64 | if clip_test:
65 | if is_clipped(audio, clipping_threshold=clipping_threshold):
66 | raise ValueError("Clipping detected in audiowrite()! " + \
67 | destpath + " file not written to disk.")
68 |
69 | if norm:
70 | audio = normalize(audio, target_level)
71 | max_amp = max(abs(audio))
72 | if max_amp >= clipping_threshold:
73 | audio = audio/max_amp * (clipping_threshold-EPS)
74 |
75 | destpath = os.path.abspath(destpath)
76 | destdir = os.path.dirname(destpath)
77 |
78 | if not os.path.exists(destdir):
79 | os.makedirs(destdir)
80 |
81 | sf.write(destpath, audio, sample_rate)
82 | return
83 |
84 |
85 | def add_reverb(sasxExe, input_wav, filter_file, output_wav):
86 | ''' Function to add reverb'''
87 | command_sasx_apply_reverb = "{0} -r {1} \
88 | -f {2} -o {3}".format(sasxExe, input_wav, filter_file, output_wav)
89 |
90 | subprocess.call(command_sasx_apply_reverb)
91 | return output_wav
92 |
93 |
94 | def add_clipping(audio, max_thresh_perc=0.8):
95 | '''Function to add clipping'''
96 | threshold = max(abs(audio))*max_thresh_perc
97 | audioclipped = np.clip(audio, -threshold, threshold)
98 | return audioclipped
99 |
100 |
101 | def adsp_filter(Adspvqe, nearEndInput, nearEndOutput, farEndInput):
102 |
103 | command_adsp_clean = "{0} --breakOnErrors 0 --sampleRate 16000 --useEchoCancellation 0 \
104 | --operatingMode 2 --useDigitalAgcNearend 0 --useDigitalAgcFarend 0 \
105 | --useVirtualAGC 0 --useComfortNoiseGenerator 0 --useAnalogAutomaticGainControl 0 \
106 | --useNoiseReduction 0 --loopbackInputFile {1} --farEndInputFile {2} \
107 | --nearEndInputFile {3} --nearEndOutputFile {4}".format(Adspvqe,
108 | farEndInput, farEndInput, nearEndInput, nearEndOutput)
109 | subprocess.call(command_adsp_clean)
110 |
111 |
112 | def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
113 | '''Function to mix clean speech and noise at various SNR levels'''
114 | cfg = params['cfg']
115 | if len(clean) > len(noise):
116 | noise = np.append(noise, np.zeros(len(clean)-len(noise)))
117 | else:
118 | clean = np.append(clean, np.zeros(len(noise)-len(clean)))
119 |
120 | # Normalizing to -25 dB FS
121 | clean = clean/(max(abs(clean))+EPS)
122 | clean = normalize(clean, target_level)
123 | rmsclean = (clean**2).mean()**0.5
124 |
125 | noise = noise/(max(abs(noise))+EPS)
126 | noise = normalize(noise, target_level)
127 | rmsnoise = (noise**2).mean()**0.5
128 |
129 | # Set the noise level for a given SNR
130 | noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS)
131 | noisenewlevel = noise * noisescalar
132 |
133 | # Mix noise and clean speech
134 | noisyspeech = clean + noisenewlevel
135 |
136 | # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value
137 | # There is a chance of clipping that might happen with very less probability, which is not a major issue.
138 | noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper'])
139 | rmsnoisy = (noisyspeech**2).mean()**0.5
140 | scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS)
141 | noisyspeech = noisyspeech * scalarnoisy
142 | clean = clean * scalarnoisy
143 | noisenewlevel = noisenewlevel * scalarnoisy
144 |
145 | # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly
146 | if is_clipped(noisyspeech):
147 | noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS)
148 | noisyspeech = noisyspeech/noisyspeech_maxamplevel
149 | clean = clean/noisyspeech_maxamplevel
150 | noisenewlevel = noisenewlevel/noisyspeech_maxamplevel
151 | noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS)))
152 |
153 | return clean, noisenewlevel, noisyspeech, noisy_rms_level
154 |
155 |
156 | def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
157 | '''Function to mix clean speech and noise at various segmental SNR levels'''
158 | cfg = params['cfg']
159 | if len(clean) > len(noise):
160 | noise = np.append(noise, np.zeros(len(clean)-len(noise)))
161 | else:
162 | clean = np.append(clean, np.zeros(len(noise)-len(clean)))
163 | clean = clean/(max(abs(clean))+EPS)
164 | noise = noise/(max(abs(noise))+EPS)
165 | rmsclean, rmsnoise = active_rms(clean=clean, noise=noise)
166 | clean = normalize_segmental_rms(clean, rms=rmsclean, target_level=target_level)
167 | noise = normalize_segmental_rms(noise, rms=rmsnoise, target_level=target_level)
168 | # Set the noise level for a given SNR
169 | noisescalar = rmsclean / (10**(snr/20)) / (rmsnoise+EPS)
170 | noisenewlevel = noise * noisescalar
171 |
172 | # Mix noise and clean speech
173 | noisyspeech = clean + noisenewlevel
174 | # Randomly select RMS value between -15 dBFS and -35 dBFS and normalize noisyspeech with that value
175 | # There is a chance of clipping that might happen with very less probability, which is not a major issue.
176 | noisy_rms_level = np.random.randint(params['target_level_lower'], params['target_level_upper'])
177 | rmsnoisy = (noisyspeech**2).mean()**0.5
178 | scalarnoisy = 10 ** (noisy_rms_level / 20) / (rmsnoisy+EPS)
179 | noisyspeech = noisyspeech * scalarnoisy
180 | clean = clean * scalarnoisy
181 | noisenewlevel = noisenewlevel * scalarnoisy
182 | # Final check to see if there are any amplitudes exceeding +/- 1. If so, normalize all the signals accordingly
183 | if is_clipped(noisyspeech):
184 | noisyspeech_maxamplevel = max(abs(noisyspeech))/(clipping_threshold-EPS)
185 | noisyspeech = noisyspeech/noisyspeech_maxamplevel
186 | clean = clean/noisyspeech_maxamplevel
187 | noisenewlevel = noisenewlevel/noisyspeech_maxamplevel
188 | noisy_rms_level = int(20*np.log10(scalarnoisy/noisyspeech_maxamplevel*(rmsnoisy+EPS)))
189 |
190 | return clean, noisenewlevel, noisyspeech, noisy_rms_level
191 |
192 |
193 | def active_rms(clean, noise, fs=16000, energy_thresh=-50):
194 | '''Returns the clean and noise RMS of the noise calculated only in the active portions'''
195 | window_size = 100 # in ms
196 | window_samples = int(fs*window_size/1000)
197 | sample_start = 0
198 | noise_active_segs = []
199 | clean_active_segs = []
200 |
201 | while sample_start < len(noise):
202 | sample_end = min(sample_start + window_samples, len(noise))
203 | noise_win = noise[sample_start:sample_end]
204 | clean_win = clean[sample_start:sample_end]
205 | noise_seg_rms = (noise_win**2).mean()**0.5
206 | # Considering frames with energy
207 | if noise_seg_rms > energy_thresh:
208 | noise_active_segs = np.append(noise_active_segs, noise_win)
209 | clean_active_segs = np.append(clean_active_segs, clean_win)
210 | sample_start += window_samples
211 |
212 | if len(noise_active_segs)!=0:
213 | noise_rms = (noise_active_segs**2).mean()**0.5
214 | else:
215 | noise_rms = EPS
216 |
217 | if len(clean_active_segs)!=0:
218 | clean_rms = (clean_active_segs**2).mean()**0.5
219 | else:
220 | clean_rms = EPS
221 |
222 | return clean_rms, noise_rms
223 |
224 |
225 | def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25):
226 | '''Return the percentage of the time the audio signal is above an energy threshold'''
227 |
228 | audio = normalize(audio, target_level)
229 | window_size = 50 # in ms
230 | window_samples = int(fs*window_size/1000)
231 | sample_start = 0
232 | cnt = 0
233 | prev_energy_prob = 0
234 | active_frames = 0
235 |
236 | a = -1
237 | b = 0.2
238 | alpha_rel = 0.05
239 | alpha_att = 0.8
240 |
241 | while sample_start < len(audio):
242 | sample_end = min(sample_start + window_samples, len(audio))
243 | audio_win = audio[sample_start:sample_end]
244 | frame_rms = 20*np.log10(sum(audio_win**2)+EPS)
245 | frame_energy_prob = 1./(1+np.exp(-(a+b*frame_rms)))
246 |
247 | if frame_energy_prob > prev_energy_prob:
248 | smoothed_energy_prob = frame_energy_prob*alpha_att + prev_energy_prob*(1-alpha_att)
249 | else:
250 | smoothed_energy_prob = frame_energy_prob*alpha_rel + prev_energy_prob*(1-alpha_rel)
251 |
252 | if smoothed_energy_prob > energy_thresh:
253 | active_frames += 1
254 | prev_energy_prob = frame_energy_prob
255 | sample_start += window_samples
256 | cnt += 1
257 |
258 | perc_active = active_frames/cnt
259 | return perc_active
260 |
261 |
262 | def resampler(input_dir, target_sr=16000, ext='*.wav'):
263 | '''Resamples the audio files in input_dir to target_sr'''
264 | files = glob.glob(f"{input_dir}/"+ext)
265 | for pathname in files:
266 | print(pathname)
267 | try:
268 | audio, fs = audioread(pathname)
269 | audio_resampled = librosa.core.resample(audio, fs, target_sr)
270 | audiowrite(pathname, audio_resampled, target_sr)
271 | except:
272 | continue
273 |
274 |
275 | def audio_segmenter(input_dir, dest_dir, segment_len=10, ext='*.wav'):
276 | '''Segments the audio clips in dir to segment_len in secs'''
277 | files = glob.glob(f"{input_dir}/"+ext)
278 | for i in range(len(files)):
279 | audio, fs = audioread(files[i])
280 |
281 | if len(audio) > (segment_len*fs) and len(audio)%(segment_len*fs) != 0:
282 | audio = np.append(audio, audio[0 : segment_len*fs - (len(audio)%(segment_len*fs))])
283 | if len(audio) < (segment_len*fs):
284 | while len(audio) < (segment_len*fs):
285 | audio = np.append(audio, audio)
286 | audio = audio[:segment_len*fs]
287 |
288 | num_segments = int(len(audio)/(segment_len*fs))
289 | audio_segments = np.split(audio, num_segments)
290 |
291 | basefilename = os.path.basename(files[i])
292 | basename, ext = os.path.splitext(basefilename)
293 |
294 | for j in range(len(audio_segments)):
295 | newname = basename+'_'+str(j)+ext
296 | destpath = os.path.join(dest_dir,newname)
297 | audiowrite(destpath, audio_segments[j], fs)
298 |
299 |
--------------------------------------------------------------------------------
/docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/CMT Instructions for uploading enhanced clips_ICASSP2022.pdf
--------------------------------------------------------------------------------
/docs/ICASSP_2021_DNS_challenge.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2021_DNS_challenge.pdf
--------------------------------------------------------------------------------
/docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/microsoft/DNS-Challenge/591184a9fcb2cbdec02520fed81a32bbbf9d73ff/docs/ICASSP_2022_4th_Deep_Noise_Suppression_Challenge.pdf
--------------------------------------------------------------------------------
/download-dns-challenge-1.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Datasets for INTERSPEECH 2020 DNS Challenge 1 *****
4 |
5 | # NOTE: This data is for the *PAST* challenge!
6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
7 | # has its own download script, `download-dns-challenge-4.sh`
8 |
9 | ###############################################################
10 |
11 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
12 |
13 | mkdir -p ./datasets/
14 |
15 | URL="$AZURE_URL/datasets-interspeech2020.tar.bz2"
16 | echo "Download: $BLOB"
17 |
18 | # DRY RUN: print HTTP header WITHOUT downloading the files
19 | curl -s -I "$URL"
20 |
21 | # Actually download the archive - UNCOMMENT it when ready to download
22 | # curl "$URL" -o "$BLOB"
23 |
24 | # Same as above, but using wget
25 | # wget "$URL" -O "$BLOB"
26 |
27 | # Same, + unpack files on the fly
28 | # curl "$URL" | tar -f - -x -j
29 |
--------------------------------------------------------------------------------
/download-dns-challenge-2.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Datasets for ICASSP 2021 DNS Challenge 2 *****
4 |
5 | # NOTE: This data is for the *PAST* challenge!
6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
7 | # has its own download script, `download-dns-challenge-4.sh`
8 |
9 | # NOTE: Before downloading, make sure you have enough space
10 | # on your local storage!
11 |
12 | # In all, you will need at least 230GB to store UNPACKED data.
13 | # Archived, the same data takes 155GB total.
14 |
15 | # Please comment out the files you don't need before launching
16 | # the script.
17 |
18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
19 | # Please scroll down and edit this script to pick the
20 | # downloading method that works best for you.
21 |
22 | # -------------------------------------------------------------
23 | # The directory structure of the unpacked data is:
24 |
25 | # datasets 229G
26 | # +-- clean 204G
27 | # | +-- emotional_speech 403M
28 | # | +-- french_data 21G
29 | # | +-- german_speech 66G
30 | # | +-- italian_speech 14G
31 | # | +-- mandarin_speech 21G
32 | # | +-- read_speech 61G
33 | # | +-- russian_speech 5.1G
34 | # | +-- singing_voice 979M
35 | # | \-- spanish_speech 17G
36 | # +-- dev_testset 211M
37 | # +-- impulse_responses 4.3G
38 | # | +-- SLR26 2.1G
39 | # | \-- SLR28 2.3G
40 | # \-- noise 20G
41 |
42 | BLOB_NAMES=(
43 |
44 | # DEMAND dataset
45 | DEMAND.tar.bz2
46 |
47 | # Wideband clean speech
48 | datasets/datasets.clean.read_speech.tar.bz2
49 |
50 | # Wideband emotional speech
51 | datasets/datasets.clean.emotional_speech.tar.bz2
52 |
53 | # Wideband non-English clean speech
54 | datasets/datasets.clean.french_data.tar.bz2
55 | datasets/datasets.clean.german_speech.tar.bz2
56 | datasets/datasets.clean.italian_speech.tar.bz2
57 | datasets/datasets.clean.mandarin_speech.tar.bz2
58 | datasets/datasets.clean.russian_speech.tar.bz2
59 | datasets/datasets.clean.singing_voice.tar.bz2
60 | datasets/datasets.clean.spanish_speech.tar.bz2
61 |
62 | # Wideband noise, IR, and test data
63 | datasets/datasets.impulse_responses.tar.bz2
64 | datasets/datasets.noise.tar.bz2
65 | datasets/datasets.dev_testset.tar.bz2
66 | )
67 |
68 | ###############################################################
69 |
70 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
71 |
72 | mkdir -p ./datasets
73 |
74 | for BLOB in ${BLOB_NAMES[@]}
75 | do
76 | URL="$AZURE_URL/$BLOB"
77 | echo "Download: $BLOB"
78 |
79 | # DRY RUN: print HTTP headers WITHOUT downloading the files
80 | curl -s -I "$URL" | head -n 1
81 |
82 | # Actually download the files - UNCOMMENT it when ready to download
83 | # curl "$URL" -o "$BLOB"
84 |
85 | # Same as above, but using wget
86 | # wget "$URL" -O "$BLOB"
87 |
88 | # Same, + unpack files on the fly
89 | # curl "$URL" | tar -f - -x -j
90 | done
91 |
--------------------------------------------------------------------------------
/download-dns-challenge-3.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Datasets for INTERSPEECH 2021 DNS Challenge 3 *****
4 |
5 | # NOTE: This data is for the *PAST* challenge!
6 | # Current DNS Challenge is ICASSP 2022 DNS Challenge 4, which
7 | # has its own download script, `download-dns-challenge-4.sh`
8 |
9 | # NOTE: Before downloading, make sure you have enough space
10 | # on your local storage!
11 |
12 | # In all, you will need at least 830GB to store UNPACKED data.
13 | # Archived, the same data takes 512GB total.
14 |
15 | # Please comment out the files you don't need before launching
16 | # the script.
17 |
18 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
19 | # Please scroll down and edit this script to pick the
20 | # downloading method that works best for you.
21 |
22 | # -------------------------------------------------------------
23 | # The directory structure of the unpacked data is:
24 |
25 | # *** Wideband data: ***
26 |
27 | # datasets 229G
28 | # +-- clean 204G
29 | # | +-- emotional_speech 403M
30 | # | +-- french_data 21G
31 | # | +-- german_speech 66G
32 | # | +-- italian_speech 14G
33 | # | +-- mandarin_speech 21G
34 | # | +-- read_speech 61G
35 | # | +-- russian_speech 5.1G
36 | # | +-- singing_voice 979M
37 | # | \-- spanish_speech 17G
38 | # +-- dev_testset 211M
39 | # +-- impulse_responses 4.3G
40 | # | +-- SLR26 2.1G
41 | # | \-- SLR28 2.3G
42 | # \-- noise 20G
43 |
44 | # *** Fullband data: ***
45 |
46 | # datasets_fullband 600G
47 | # +-- clean_fullband 542G
48 | # | +-- VocalSet_48kHz_mono 974M
49 | # | +-- emotional_speech 1.2G
50 | # | +-- french_data 62G
51 | # | +-- german_speech 194G
52 | # | +-- italian_speech 42G
53 | # | +-- read_speech 182G
54 | # | +-- russian_speech 12G
55 | # | \-- spanish_speech 50G
56 | # +-- dev_testset_fullband 630M
57 | # \-- noise_fullband 58G
58 |
59 | BLOB_NAMES=(
60 |
61 | # DEMAND dataset
62 | DEMAND.tar.bz2
63 |
64 | # Wideband clean speech
65 | datasets/datasets.clean.read_speech.tar.bz2
66 |
67 | # Wideband emotional speech
68 | datasets/datasets.clean.emotional_speech.tar.bz2
69 |
70 | # Wideband non-English clean speech
71 | datasets/datasets.clean.french_data.tar.bz2
72 | datasets/datasets.clean.german_speech.tar.bz2
73 | datasets/datasets.clean.italian_speech.tar.bz2
74 | datasets/datasets.clean.mandarin_speech.tar.bz2
75 | datasets/datasets.clean.russian_speech.tar.bz2
76 | datasets/datasets.clean.singing_voice.tar.bz2
77 | datasets/datasets.clean.spanish_speech.tar.bz2
78 |
79 | # Wideband noise, IR, and test data
80 | datasets/datasets.impulse_responses.tar.bz2
81 | datasets/datasets.noise.tar.bz2
82 | datasets/datasets.dev_testset.tar.bz2
83 |
84 | # ---------------------------------------------------------
85 |
86 | # Fullband clean speech
87 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.0.tar.bz2
88 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.1.tar.bz2
89 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.2.tar.bz2
90 | datasets_fullband/datasets_fullband.clean_fullband.read_speech.3.tar.bz2
91 | datasets_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono.tar.bz2
92 |
93 | # Fullband emotional speech
94 | datasets_fullband/datasets_fullband.clean_fullband.emotional_speech.tar.bz2
95 |
96 | # Fullband non-English clean speech
97 | datasets_fullband/datasets_fullband.clean_fullband.french_data.tar.bz2
98 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.0.tar.bz2
99 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.1.tar.bz2
100 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.2.tar.bz2
101 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.3.tar.bz2
102 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.4.tar.bz2
103 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.5.tar.bz2
104 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.6.tar.bz2
105 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.7.tar.bz2
106 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.8.tar.bz2
107 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.9.tar.bz2
108 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.10.tar.bz2
109 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.11.tar.bz2
110 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.12.tar.bz2
111 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.13.tar.bz2
112 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.14.tar.bz2
113 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.15.tar.bz2
114 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.16.tar.bz2
115 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.17.tar.bz2
116 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.18.tar.bz2
117 | datasets_fullband/datasets_fullband.clean_fullband.german_speech.19.tar.bz2
118 | datasets_fullband/datasets_fullband.clean_fullband.italian_speech.tar.bz2
119 | datasets_fullband/datasets_fullband.clean_fullband.russian_speech.tar.bz2
120 | datasets_fullband/datasets_fullband.clean_fullband.spanish_speech.tar.bz2
121 |
122 | # Fullband noise and test data
123 | datasets_fullband/datasets_fullband.noise_fullband.tar.bz2
124 | datasets_fullband/datasets_fullband.dev_testset_fullband.tar.bz2
125 | )
126 |
127 | ###############################################################
128 |
129 | AZURE_URL="https://dns3public.blob.core.windows.net/dns3archive"
130 |
131 | mkdir -p ./datasets ./datasets_fullband
132 |
133 | for BLOB in ${BLOB_NAMES[@]}
134 | do
135 | URL="$AZURE_URL/$BLOB"
136 | echo "Download: $BLOB"
137 |
138 | # DRY RUN: print HTTP headers WITHOUT downloading the files
139 | curl -s -I "$URL" | head -n 1
140 |
141 | # Actually download the files - UNCOMMENT it when ready to download
142 | # curl "$URL" -o "$BLOB"
143 |
144 | # Same as above, but using wget
145 | # wget "$URL" -O "$BLOB"
146 |
147 | # Same, + unpack files on the fly
148 | # curl "$URL" | tar -f - -x -j
149 | done
150 |
--------------------------------------------------------------------------------
/download-dns-challenge-4-pdns.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Personalized DNS Track *****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # In all, you will need about 380TB to store the UNPACKED data.
9 | # Archived, the same data takes about 200GB total.
10 |
11 | # Please comment out the files you don't need before launching
12 | # the script.
13 |
14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
15 | # Please scroll down and edit this script to pick the
16 | # downloading method that works best for you.
17 |
18 | # -------------------------------------------------------------
19 | # The directory structure of the unpacked data is:
20 |
21 | # . 362G
22 | # +-- datasets_fullband 64G
23 | # | +-- impulse_responses 5.9G
24 | # | \-- noise_fullband 58G
25 | # +-- pdns_training_set 294G
26 | # | +-- enrollment_embeddings 115M
27 | # | +-- enrollment_wav 42G
28 | # | +-- raw/clean 252G
29 | # | +-- english 168G
30 | # | +-- french 2.1G
31 | # | +-- german 53G
32 | # | +-- italian 17G
33 | # | +-- russian 6.8G
34 | # | \-- spanish 5.4G
35 | # \-- personalized_dev_testset 3.3G
36 |
37 | BLOB_NAMES=(
38 |
39 | pdns_training_set/raw/pdns_training_set.raw.clean.english_000.tar.bz2
40 | pdns_training_set/raw/pdns_training_set.raw.clean.english_001.tar.bz2
41 | pdns_training_set/raw/pdns_training_set.raw.clean.english_002.tar.bz2
42 | pdns_training_set/raw/pdns_training_set.raw.clean.english_003.tar.bz2
43 | pdns_training_set/raw/pdns_training_set.raw.clean.english_004.tar.bz2
44 | pdns_training_set/raw/pdns_training_set.raw.clean.english_005.tar.bz2
45 | pdns_training_set/raw/pdns_training_set.raw.clean.english_006.tar.bz2
46 | pdns_training_set/raw/pdns_training_set.raw.clean.english_007.tar.bz2
47 | pdns_training_set/raw/pdns_training_set.raw.clean.english_008.tar.bz2
48 | pdns_training_set/raw/pdns_training_set.raw.clean.english_009.tar.bz2
49 | pdns_training_set/raw/pdns_training_set.raw.clean.english_010.tar.bz2
50 | pdns_training_set/raw/pdns_training_set.raw.clean.english_011.tar.bz2
51 | pdns_training_set/raw/pdns_training_set.raw.clean.english_012.tar.bz2
52 | pdns_training_set/raw/pdns_training_set.raw.clean.english_013.tar.bz2
53 | pdns_training_set/raw/pdns_training_set.raw.clean.english_014.tar.bz2
54 | pdns_training_set/raw/pdns_training_set.raw.clean.english_015.tar.bz2
55 | pdns_training_set/raw/pdns_training_set.raw.clean.english_016.tar.bz2
56 | pdns_training_set/raw/pdns_training_set.raw.clean.english_017.tar.bz2
57 | pdns_training_set/raw/pdns_training_set.raw.clean.english_018.tar.bz2
58 | pdns_training_set/raw/pdns_training_set.raw.clean.english_019.tar.bz2
59 | pdns_training_set/raw/pdns_training_set.raw.clean.english_020.tar.bz2
60 | pdns_training_set/raw/pdns_training_set.raw.clean.french_000.tar.bz2
61 | pdns_training_set/raw/pdns_training_set.raw.clean.german_000.tar.bz2
62 | pdns_training_set/raw/pdns_training_set.raw.clean.german_001.tar.bz2
63 | pdns_training_set/raw/pdns_training_set.raw.clean.german_002.tar.bz2
64 | pdns_training_set/raw/pdns_training_set.raw.clean.german_003.tar.bz2
65 | pdns_training_set/raw/pdns_training_set.raw.clean.german_004.tar.bz2
66 | pdns_training_set/raw/pdns_training_set.raw.clean.german_005.tar.bz2
67 | pdns_training_set/raw/pdns_training_set.raw.clean.german_006.tar.bz2
68 | pdns_training_set/raw/pdns_training_set.raw.clean.german_007.tar.bz2
69 | pdns_training_set/raw/pdns_training_set.raw.clean.german_008.tar.bz2
70 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_000.tar.bz2
71 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_001.tar.bz2
72 | pdns_training_set/raw/pdns_training_set.raw.clean.italian_002.tar.bz2
73 | pdns_training_set/raw/pdns_training_set.raw.clean.russian_000.tar.bz2
74 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_000.tar.bz2
75 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_001.tar.bz2
76 | pdns_training_set/raw/pdns_training_set.raw.clean.spanish_002.tar.bz2
77 |
78 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_000.tar.bz2
79 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_001.tar.bz2
80 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_002.tar.bz2
81 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_003.tar.bz2
82 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.english_004.tar.bz2
83 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.french_000.tar.bz2
84 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_000.tar.bz2
85 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.german_001.tar.bz2
86 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.italian_000.tar.bz2
87 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.russian_000.tar.bz2
88 | pdns_training_set/enrollment_wav/pdns_training_set.enrollment_wav.spanish_000.tar.bz2
89 |
90 | pdns_training_set/pdns_training_set.enrollment_embeddings_000.tar.bz2
91 |
92 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
93 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
94 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
95 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
96 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
97 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
98 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
99 |
100 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
101 | datasets_fullband/noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
102 |
103 | datasets_fullband/datasets_fullband.impulse_responses_000.tar.bz2
104 |
105 | personalized_dev_testset/personalized_dev_testset.enrollment.tar.bz2
106 | personalized_dev_testset/personalized_dev_testset.noisy_testclips.tar.bz2
107 | )
108 |
109 | ###############################################################
110 |
111 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive"
112 |
113 | OUTPUT_PATH="."
114 |
115 | mkdir -p $OUTPUT_PATH/{pdns_training_set/{raw,enrollment_wav},datasets_fullband/noise_fullband}
116 |
117 | for BLOB in ${BLOB_NAMES[@]}
118 | do
119 | URL="$AZURE_URL/$BLOB"
120 | echo "Download: $BLOB"
121 |
122 | # DRY RUN: print HTTP response and Content-Length
123 | # WITHOUT downloading the files
124 | curl -s -I "$URL" | head -n 2
125 |
126 | # Actually download the files: UNCOMMENT when ready to download
127 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
128 |
129 | # Same as above, but using wget
130 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
131 |
132 | # Same, + unpack files on the fly
133 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
134 | done
135 |
--------------------------------------------------------------------------------
/download-dns-challenge-4.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Datasets for ICASSP 2022 DNS Challenge 4 - Main (Real-Time) Track *****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # In all, you will need about 1TB to store the UNPACKED data.
9 | # Archived, the same data takes about 550GB total.
10 |
11 | # Please comment out the files you don't need before launching
12 | # the script.
13 |
14 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
15 | # Please scroll down and edit this script to pick the
16 | # downloading method that works best for you.
17 |
18 | # -------------------------------------------------------------
19 | # The directory structure of the unpacked data is:
20 |
21 | # datasets_fullband 892G
22 | # +-- dev_testset 1.7G
23 | # +-- impulse_responses 5.9G
24 | # +-- noise_fullband 58G
25 | # \-- clean_fullband 827G
26 | # +-- emotional_speech 2.4G
27 | # +-- french_speech 62G
28 | # +-- german_speech 319G
29 | # +-- italian_speech 42G
30 | # +-- read_speech 299G
31 | # +-- russian_speech 12G
32 | # +-- spanish_speech 65G
33 | # +-- vctk_wav48_silence_trimmed 27G
34 | # \-- VocalSet_48kHz_mono 974M
35 |
36 | BLOB_NAMES=(
37 |
38 | clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2
39 |
40 | clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2
41 |
42 | clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2
43 | clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2
44 | clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2
45 | clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2
46 | clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2
47 | clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2
48 | clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2
49 | clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2
50 | clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2
51 |
52 | clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2
53 | clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2
54 | clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2
55 | clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2
56 | clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2
57 | clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2
58 | clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2
59 | clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2
60 | clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2
61 | clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2
62 | clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2
63 | clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2
64 | clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2
65 | clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2
66 | clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2
67 | clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2
68 | clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2
69 | clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2
70 | clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2
71 | clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2
72 | clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2
73 | clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2
74 | clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2
75 | clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2
76 | clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2
77 | clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2
78 | clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2
79 | clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2
80 | clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2
81 | clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2
82 | clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2
83 | clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2
84 | clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2
85 | clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2
86 | clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2
87 | clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2
88 | clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2
89 | clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2
90 | clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2
91 | clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2
92 | clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2
93 | clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2
94 | clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2
95 |
96 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2
97 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2
98 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2
99 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2
100 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2
101 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2
102 |
103 | clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2
104 | clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2
105 | clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2
106 | clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2
107 | clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2
108 | clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2
109 | clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2
110 | clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2
111 | clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2
112 | clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2
113 | clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2
114 | clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2
115 | clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2
116 | clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2
117 | clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2
118 | clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2
119 | clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2
120 | clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2
121 | clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2
122 | clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2
123 | clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2
124 | clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2
125 | clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2
126 | clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2
127 | clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2
128 | clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2
129 | clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2
130 | clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2
131 | clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2
132 | clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2
133 | clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2
134 | clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2
135 | clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2
136 | clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2
137 | clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2
138 | clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2
139 | clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2
140 | clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2
141 | clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2
142 | clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2
143 |
144 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2
145 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2
146 |
147 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2
148 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2
149 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2
150 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2
151 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2
152 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2
153 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2
154 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2
155 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2
156 |
157 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2
158 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2
159 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2
160 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2
161 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2
162 |
163 | noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
164 | noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
165 | noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
166 | noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
167 | noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
168 | noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
169 | noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
170 |
171 | noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
172 | noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
173 |
174 | datasets_fullband.dev_testset_000.tar.bz2
175 |
176 | datasets_fullband.impulse_responses_000.tar.bz2
177 | )
178 |
179 | ###############################################################
180 |
181 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband"
182 |
183 | OUTPUT_PATH="./datasets_fullband"
184 |
185 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband}
186 |
187 | for BLOB in ${BLOB_NAMES[@]}
188 | do
189 | URL="$AZURE_URL/$BLOB"
190 | echo "Download: $BLOB"
191 |
192 | # DRY RUN: print HTTP response and Content-Length
193 | # WITHOUT downloading the files
194 | curl -s -I "$URL" | head -n 2
195 |
196 | # Actually download the files: UNCOMMENT when ready to download
197 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
198 |
199 | # Same as above, but using wget
200 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
201 |
202 | # Same, + unpack files on the fly
203 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
204 | done
205 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-baseline.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # Zip file is 1.4 GB.
9 | # -------------------------------------------------------------
10 |
11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip"
12 | echo "Download: $URL"
13 | #
14 | # DRY RUN: print HTTP header WITHOUT downloading the files
15 | curl -s -I "$URL"
16 | #
17 | # Actually download the archive - UNCOMMENT it when ready to download
18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/Baseline.zip --output 'Baseline.zip'
19 | #wget --no-check-certificate "$URL"
20 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-filelists-headset.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # Zip file is 1.5MB.
9 | # It contains speaker ID filsists for headset training clean speech (Track 1)
10 | # -------------------------------------------------------------
11 |
12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip"
13 | echo "Download: $URL"
14 | #
15 | # DRY RUN: print HTTP header WITHOUT downloading the files
16 | curl -s -I "$URL"
17 | #
18 | # Actually download the archive - UNCOMMENT it when ready to download
19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_headset.zip --output 'filelists_headset.zip'
20 | #wget --no-check-certificate "$URL"
21 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-filelists-speakerphone.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # Zip file is 1.5MB.
9 | # It contains speaker ID filsists for speakerphone training clean speech (Track 2)
10 | # -------------------------------------------------------------
11 |
12 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip"
13 | echo "Download: $URL"
14 | #
15 | # DRY RUN: print HTTP header WITHOUT downloading the files
16 | curl -s -I "$URL"
17 | #
18 | # Actually download the archive - UNCOMMENT it when ready to download
19 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/filelists_speakerphone.zip --output 'filelists_speakerphone.zip'
20 | #wget --no-check-certificate "$URL"
21 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-headset-training.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** 5th DNS Challenge at ICASSP 2023*****
4 | # Track 1 Headset Clean speech: All Languages
5 | # -------------------------------------------------------------
6 | # In all, you will need about 1TB to store the UNPACKED data.
7 | # Archived, the same data takes about 550GB total.
8 |
9 | # Please comment out the files you don't need before launching
10 | # the script.
11 |
12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
13 | # Please scroll down and edit this script to pick the
14 | # downloading method that works best for you.
15 |
16 | # -------------------------------------------------------------
17 | # The directory structure of the unpacked data is:
18 |
19 | # datasets_fullband
20 | # \-- clean_fullband 827G
21 | # +-- emotional_speech 2.4G
22 | # +-- french_speech 62G
23 | # +-- german_speech 319G
24 | # +-- italian_speech 42G
25 | # +-- read_speech 299G
26 | # +-- russian_speech 12G
27 | # +-- spanish_speech 65G
28 | # +-- vctk_wav48_silence_trimmed 27G
29 | # \-- VocalSet_48kHz_mono 974M
30 |
31 | BLOB_NAMES=(
32 |
33 | Track1_Headset/VocalSet_48kHz_mono.tgz
34 | Track1_Headset/emotional_speech.tgz
35 |
36 | Track1_Headset/french_speech.tar.gz.partaa
37 | Track1_Headset/french_speech.tar.gz.partab
38 | Track1_Headset/french_speech.tar.gz.partac
39 | Track1_Headset/french_speech.tar.gz.partad
40 | Track1_Headset/french_speech.tar.gz.partae
41 | Track1_Headset/french_speech.tar.gz.partah
42 |
43 | Track1_Headset/german_speech.tgz.partaa
44 | Track1_Headset/german_speech.tgz.partab
45 | Track1_Headset/german_speech.tgz.partac
46 | Track1_Headset/german_speech.tgz.partad
47 | Track1_Headset/german_speech.tgz.partae
48 | Track1_Headset/german_speech.tgz.partaf
49 | Track1_Headset/german_speech.tgz.partag
50 | Track1_Headset/german_speech.tgz.partah
51 | Track1_Headset/german_speech.tgz.partaj
52 | Track1_Headset/german_speech.tgz.partal
53 | Track1_Headset/german_speech.tgz.partam
54 | Track1_Headset/german_speech.tgz.partan
55 | Track1_Headset/german_speech.tgz.partao
56 | Track1_Headset/german_speech.tgz.partap
57 | Track1_Headset/german_speech.tgz.partaq
58 | Track1_Headset/german_speech.tgz.partar
59 | Track1_Headset/german_speech.tgz.partas
60 | Track1_Headset/german_speech.tgz.partat
61 | Track1_Headset/german_speech.tgz.partau
62 | Track1_Headset/german_speech.tgz.partav
63 | Track1_Headset/german_speech.tgz.partaw
64 |
65 | Track1_Headset/italian_speech.tgz.partaa
66 | Track1_Headset/italian_speech.tgz.partab
67 | Track1_Headset/italian_speech.tgz.partac
68 | Track1_Headset/italian_speech.tgz.partad
69 |
70 | Track1_Headset/read_speech.tgz.partaa
71 | Track1_Headset/read_speech.tgz.partab
72 | Track1_Headset/read_speech.tgz.partac
73 | Track1_Headset/read_speech.tgz.partad
74 | Track1_Headset/read_speech.tgz.partae
75 | Track1_Headset/read_speech.tgz.partaf
76 | Track1_Headset/read_speech.tgz.partag
77 | Track1_Headset/read_speech.tgz.partah
78 | Track1_Headset/read_speech.tgz.partai
79 | Track1_Headset/read_speech.tgz.partaj
80 | Track1_Headset/read_speech.tgz.partak
81 | Track1_Headset/read_speech.tgz.partal
82 | Track1_Headset/read_speech.tgz.partam
83 | Track1_Headset/read_speech.tgz.partan
84 | Track1_Headset/read_speech.tgz.partao
85 | Track1_Headset/read_speech.tgz.partap
86 | Track1_Headset/read_speech.tgz.partaq
87 | Track1_Headset/read_speech.tgz.partar
88 | Track1_Headset/read_speech.tgz.partas
89 | Track1_Headset/read_speech.tgz.partat
90 | Track1_Headset/read_speech.tgz.partau
91 |
92 | Track1_Headset/russian_speech.tgz
93 |
94 | Track1_Headset/spanish_speech.tgz.partaa
95 | Track1_Headset/spanish_speech.tgz.partab
96 | Track1_Headset/spanish_speech.tgz.partac
97 | Track1_Headset/spanish_speech.tgz.partad
98 | Track1_Headset/spanish_speech.tgz.partae
99 | Track1_Headset/spanish_speech.tgz.partaf
100 | Track1_Headset/spanish_speech.tgz.partag
101 |
102 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partaa
103 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partab
104 | Track1_Headset/vctk_wav48_silence_trimmed.tgz.partac
105 | )
106 |
107 | ###############################################################
108 | # this data is extracted from datasets used in Track 2.
109 |
110 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset"
111 |
112 | OUTPUT_PATH="./datasets_fullband"
113 |
114 | mkdir -p $OUTPUT_PATH/{clean_fullband}
115 |
116 | for BLOB in ${BLOB_NAMES[@]}
117 | do
118 | URL="$AZURE_URL/$BLOB"
119 | echo "Download: $BLOB"
120 |
121 | # DRY RUN: print HTTP response and Content-Length
122 | # WITHOUT downloading the files
123 | curl -s -I "$URL" | head -n 2
124 |
125 | # Actually download the files: UNCOMMENT when ready to download
126 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
127 |
128 | # Same as above, but using wget
129 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
130 |
131 | # Same, + unpack files on the fly
132 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
133 | done
134 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-noise-ir.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** 5th DNS Challenge at ICASSP 2023*****
4 | # Noise data which is used in both tracks
5 | # Also download the impulse response data
6 |
7 | # All compressed noises files are ~39 GB
8 | # -------------------------------------------------------------
9 | # -------------------------------------------------------------
10 | # The directory structure of the unpacked data is:
11 | # +-- noise_fullband
12 |
13 | BLOB_NAMES=(
14 | noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2
15 | noise_fullband/datasets_fullband.noise_fullband.audioset_001.tar.bz2
16 | noise_fullband/datasets_fullband.noise_fullband.audioset_002.tar.bz2
17 | noise_fullband/datasets_fullband.noise_fullband.audioset_003.tar.bz2
18 | noise_fullband/datasets_fullband.noise_fullband.audioset_004.tar.bz2
19 | noise_fullband/datasets_fullband.noise_fullband.audioset_005.tar.bz2
20 | noise_fullband/datasets_fullband.noise_fullband.audioset_006.tar.bz2
21 |
22 | noise_fullband/datasets_fullband.noise_fullband.freesound_000.tar.bz2
23 | noise_fullband/datasets_fullband.noise_fullband.freesound_001.tar.bz2
24 |
25 | datasets_fullband.impulse_responses_000.tar.bz2
26 | )
27 |
28 | ###############################################################
29 |
30 | AZURE_URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset"
31 |
32 | OUTPUT_PATH="./"
33 |
34 | mkdir -p $OUTPUT_PATH/{noise_fullband}
35 |
36 | for BLOB in ${BLOB_NAMES[@]}
37 | do
38 | URL="$AZURE_URL/$BLOB"
39 | echo "Download: $BLOB"
40 |
41 | # DRY RUN: print HTTP response and Content-Length
42 | # WITHOUT downloading the files
43 | curl -s -I "$URL" | head -n 2
44 |
45 | # Actually download the files: UNCOMMENT when ready to download
46 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
47 |
48 | # Same as above, but using wget
49 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
50 |
51 | # Same, + unpack files on the fly
52 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
53 | done
54 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-paralinguistic-train.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # Zip file is 181.8 MB.
9 | # -------------------------------------------------------------
10 |
11 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip"
12 | echo "Download: $URL"
13 | #
14 | # DRY RUN: print HTTP header WITHOUT downloading the files
15 | curl -s -I "$URL"
16 | #
17 | # Actually download the archive - UNCOMMENT it when ready to download
18 | curl https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_training_dataset/paralinguistic_training.zip --output 'paralinguistic_training.zip'
19 | #wget --no-check-certificate "$URL"
20 |
--------------------------------------------------------------------------------
/download-dns-challenge-5-speakerphone-training.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** 5th DNS Challenge at ICASSP 2023*****
4 | # Track 2 Speakerphone Clean speech: All Languages
5 | # -------------------------------------------------------------
6 | # In all, you will need about 1TB to store the UNPACKED data.
7 | # Archived, the same data takes about 550GB total.
8 |
9 | # Please comment out the files you don't need before launching
10 | # the script.
11 |
12 | # NOTE: By default, the script *DOES NOT* DOWNLOAD ANY FILES!
13 | # Please scroll down and edit this script to pick the
14 | # downloading method that works best for you.
15 |
16 | # -------------------------------------------------------------
17 | # The directory structure of the unpacked data is:
18 |
19 | # datasets_fullband
20 | # \-- clean_fullband 827G
21 | # +-- emotional_speech 2.4G
22 | # +-- french_speech 62G
23 | # +-- german_speech 319G
24 | # +-- italian_speech 42G
25 | # +-- read_speech 299G
26 | # +-- russian_speech 12G
27 | # +-- spanish_speech 65G
28 | # +-- vctk_wav48_silence_trimmed 27G
29 | # \-- VocalSet_48kHz_mono 974M
30 |
31 | BLOB_NAMES=(
32 |
33 | clean_fullband/datasets_fullband.clean_fullband.VocalSet_48kHz_mono_000_NA_NA.tar.bz2
34 |
35 | clean_fullband/datasets_fullband.clean_fullband.emotional_speech_000_NA_NA.tar.bz2
36 |
37 | clean_fullband/datasets_fullband.clean_fullband.french_speech_000_NA_NA.tar.bz2
38 | clean_fullband/datasets_fullband.clean_fullband.french_speech_001_NA_NA.tar.bz2
39 | clean_fullband/datasets_fullband.clean_fullband.french_speech_002_NA_NA.tar.bz2
40 | clean_fullband/datasets_fullband.clean_fullband.french_speech_003_NA_NA.tar.bz2
41 | clean_fullband/datasets_fullband.clean_fullband.french_speech_004_NA_NA.tar.bz2
42 | clean_fullband/datasets_fullband.clean_fullband.french_speech_005_NA_NA.tar.bz2
43 | clean_fullband/datasets_fullband.clean_fullband.french_speech_006_NA_NA.tar.bz2
44 | clean_fullband/datasets_fullband.clean_fullband.french_speech_007_NA_NA.tar.bz2
45 | clean_fullband/datasets_fullband.clean_fullband.french_speech_008_NA_NA.tar.bz2
46 |
47 | clean_fullband/datasets_fullband.clean_fullband.german_speech_000_0.00_3.47.tar.bz2
48 | clean_fullband/datasets_fullband.clean_fullband.german_speech_001_3.47_3.64.tar.bz2
49 | clean_fullband/datasets_fullband.clean_fullband.german_speech_002_3.64_3.74.tar.bz2
50 | clean_fullband/datasets_fullband.clean_fullband.german_speech_003_3.74_3.81.tar.bz2
51 | clean_fullband/datasets_fullband.clean_fullband.german_speech_004_3.81_3.86.tar.bz2
52 | clean_fullband/datasets_fullband.clean_fullband.german_speech_005_3.86_3.91.tar.bz2
53 | clean_fullband/datasets_fullband.clean_fullband.german_speech_006_3.91_3.96.tar.bz2
54 | clean_fullband/datasets_fullband.clean_fullband.german_speech_007_3.96_4.00.tar.bz2
55 | clean_fullband/datasets_fullband.clean_fullband.german_speech_008_4.00_4.04.tar.bz2
56 | clean_fullband/datasets_fullband.clean_fullband.german_speech_009_4.04_4.08.tar.bz2
57 | clean_fullband/datasets_fullband.clean_fullband.german_speech_010_4.08_4.12.tar.bz2
58 | clean_fullband/datasets_fullband.clean_fullband.german_speech_011_4.12_4.16.tar.bz2
59 | clean_fullband/datasets_fullband.clean_fullband.german_speech_012_4.16_4.21.tar.bz2
60 | clean_fullband/datasets_fullband.clean_fullband.german_speech_013_4.21_4.26.tar.bz2
61 | clean_fullband/datasets_fullband.clean_fullband.german_speech_014_4.26_4.33.tar.bz2
62 | clean_fullband/datasets_fullband.clean_fullband.german_speech_015_4.33_4.43.tar.bz2
63 | clean_fullband/datasets_fullband.clean_fullband.german_speech_016_4.43_NA.tar.bz2
64 | clean_fullband/datasets_fullband.clean_fullband.german_speech_017_NA_NA.tar.bz2
65 | clean_fullband/datasets_fullband.clean_fullband.german_speech_018_NA_NA.tar.bz2
66 | clean_fullband/datasets_fullband.clean_fullband.german_speech_019_NA_NA.tar.bz2
67 | clean_fullband/datasets_fullband.clean_fullband.german_speech_020_NA_NA.tar.bz2
68 | clean_fullband/datasets_fullband.clean_fullband.german_speech_021_NA_NA.tar.bz2
69 | clean_fullband/datasets_fullband.clean_fullband.german_speech_022_NA_NA.tar.bz2
70 | clean_fullband/datasets_fullband.clean_fullband.german_speech_023_NA_NA.tar.bz2
71 | clean_fullband/datasets_fullband.clean_fullband.german_speech_024_NA_NA.tar.bz2
72 | clean_fullband/datasets_fullband.clean_fullband.german_speech_025_NA_NA.tar.bz2
73 | clean_fullband/datasets_fullband.clean_fullband.german_speech_026_NA_NA.tar.bz2
74 | clean_fullband/datasets_fullband.clean_fullband.german_speech_027_NA_NA.tar.bz2
75 | clean_fullband/datasets_fullband.clean_fullband.german_speech_028_NA_NA.tar.bz2
76 | clean_fullband/datasets_fullband.clean_fullband.german_speech_029_NA_NA.tar.bz2
77 | clean_fullband/datasets_fullband.clean_fullband.german_speech_030_NA_NA.tar.bz2
78 | clean_fullband/datasets_fullband.clean_fullband.german_speech_031_NA_NA.tar.bz2
79 | clean_fullband/datasets_fullband.clean_fullband.german_speech_032_NA_NA.tar.bz2
80 | clean_fullband/datasets_fullband.clean_fullband.german_speech_033_NA_NA.tar.bz2
81 | clean_fullband/datasets_fullband.clean_fullband.german_speech_034_NA_NA.tar.bz2
82 | clean_fullband/datasets_fullband.clean_fullband.german_speech_035_NA_NA.tar.bz2
83 | clean_fullband/datasets_fullband.clean_fullband.german_speech_036_NA_NA.tar.bz2
84 | clean_fullband/datasets_fullband.clean_fullband.german_speech_037_NA_NA.tar.bz2
85 | clean_fullband/datasets_fullband.clean_fullband.german_speech_038_NA_NA.tar.bz2
86 | clean_fullband/datasets_fullband.clean_fullband.german_speech_039_NA_NA.tar.bz2
87 | clean_fullband/datasets_fullband.clean_fullband.german_speech_040_NA_NA.tar.bz2
88 | clean_fullband/datasets_fullband.clean_fullband.german_speech_041_NA_NA.tar.bz2
89 | clean_fullband/datasets_fullband.clean_fullband.german_speech_042_NA_NA.tar.bz2
90 |
91 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_000_0.00_3.98.tar.bz2
92 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_001_3.98_4.21.tar.bz2
93 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_002_4.21_4.40.tar.bz2
94 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_003_4.40_NA.tar.bz2
95 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_004_NA_NA.tar.bz2
96 | clean_fullband/datasets_fullband.clean_fullband.italian_speech_005_NA_NA.tar.bz2
97 |
98 | clean_fullband/datasets_fullband.clean_fullband.read_speech_000_0.00_3.75.tar.bz2
99 | clean_fullband/datasets_fullband.clean_fullband.read_speech_001_3.75_3.88.tar.bz2
100 | clean_fullband/datasets_fullband.clean_fullband.read_speech_002_3.88_3.96.tar.bz2
101 | clean_fullband/datasets_fullband.clean_fullband.read_speech_003_3.96_4.02.tar.bz2
102 | clean_fullband/datasets_fullband.clean_fullband.read_speech_004_4.02_4.06.tar.bz2
103 | clean_fullband/datasets_fullband.clean_fullband.read_speech_005_4.06_4.10.tar.bz2
104 | clean_fullband/datasets_fullband.clean_fullband.read_speech_006_4.10_4.13.tar.bz2
105 | clean_fullband/datasets_fullband.clean_fullband.read_speech_007_4.13_4.16.tar.bz2
106 | clean_fullband/datasets_fullband.clean_fullband.read_speech_008_4.16_4.19.tar.bz2
107 | clean_fullband/datasets_fullband.clean_fullband.read_speech_009_4.19_4.21.tar.bz2
108 | clean_fullband/datasets_fullband.clean_fullband.read_speech_010_4.21_4.24.tar.bz2
109 | clean_fullband/datasets_fullband.clean_fullband.read_speech_011_4.24_4.26.tar.bz2
110 | clean_fullband/datasets_fullband.clean_fullband.read_speech_012_4.26_4.29.tar.bz2
111 | clean_fullband/datasets_fullband.clean_fullband.read_speech_013_4.29_4.31.tar.bz2
112 | clean_fullband/datasets_fullband.clean_fullband.read_speech_014_4.31_4.33.tar.bz2
113 | clean_fullband/datasets_fullband.clean_fullband.read_speech_015_4.33_4.35.tar.bz2
114 | clean_fullband/datasets_fullband.clean_fullband.read_speech_016_4.35_4.38.tar.bz2
115 | clean_fullband/datasets_fullband.clean_fullband.read_speech_017_4.38_4.40.tar.bz2
116 | clean_fullband/datasets_fullband.clean_fullband.read_speech_018_4.40_4.42.tar.bz2
117 | clean_fullband/datasets_fullband.clean_fullband.read_speech_019_4.42_4.45.tar.bz2
118 | clean_fullband/datasets_fullband.clean_fullband.read_speech_020_4.45_4.48.tar.bz2
119 | clean_fullband/datasets_fullband.clean_fullband.read_speech_021_4.48_4.52.tar.bz2
120 | clean_fullband/datasets_fullband.clean_fullband.read_speech_022_4.52_4.57.tar.bz2
121 | clean_fullband/datasets_fullband.clean_fullband.read_speech_023_4.57_4.67.tar.bz2
122 | clean_fullband/datasets_fullband.clean_fullband.read_speech_024_4.67_NA.tar.bz2
123 | clean_fullband/datasets_fullband.clean_fullband.read_speech_025_NA_NA.tar.bz2
124 | clean_fullband/datasets_fullband.clean_fullband.read_speech_026_NA_NA.tar.bz2
125 | clean_fullband/datasets_fullband.clean_fullband.read_speech_027_NA_NA.tar.bz2
126 | clean_fullband/datasets_fullband.clean_fullband.read_speech_028_NA_NA.tar.bz2
127 | clean_fullband/datasets_fullband.clean_fullband.read_speech_029_NA_NA.tar.bz2
128 | clean_fullband/datasets_fullband.clean_fullband.read_speech_030_NA_NA.tar.bz2
129 | clean_fullband/datasets_fullband.clean_fullband.read_speech_031_NA_NA.tar.bz2
130 | clean_fullband/datasets_fullband.clean_fullband.read_speech_032_NA_NA.tar.bz2
131 | clean_fullband/datasets_fullband.clean_fullband.read_speech_033_NA_NA.tar.bz2
132 | clean_fullband/datasets_fullband.clean_fullband.read_speech_034_NA_NA.tar.bz2
133 | clean_fullband/datasets_fullband.clean_fullband.read_speech_035_NA_NA.tar.bz2
134 | clean_fullband/datasets_fullband.clean_fullband.read_speech_036_NA_NA.tar.bz2
135 | clean_fullband/datasets_fullband.clean_fullband.read_speech_037_NA_NA.tar.bz2
136 | clean_fullband/datasets_fullband.clean_fullband.read_speech_038_NA_NA.tar.bz2
137 | clean_fullband/datasets_fullband.clean_fullband.read_speech_039_NA_NA.tar.bz2
138 |
139 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_000_0.00_4.31.tar.bz2
140 | clean_fullband/datasets_fullband.clean_fullband.russian_speech_001_4.31_NA.tar.bz2
141 |
142 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_000_0.00_4.09.tar.bz2
143 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_001_4.09_NA.tar.bz2
144 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_002_NA_NA.tar.bz2
145 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_003_NA_NA.tar.bz2
146 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_004_NA_NA.tar.bz2
147 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_005_NA_NA.tar.bz2
148 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_006_NA_NA.tar.bz2
149 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_007_NA_NA.tar.bz2
150 | clean_fullband/datasets_fullband.clean_fullband.spanish_speech_008_NA_NA.tar.bz2
151 |
152 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_000.tar.bz2
153 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_001.tar.bz2
154 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_002.tar.bz2
155 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_003.tar.bz2
156 | clean_fullband/datasets_fullband.clean_fullband.vctk_wav48_silence_trimmed_004.tar.bz2
157 |
158 | )
159 |
160 | ###############################################################
161 | # this data is identical to non-personalized track 4th DNS Challenge clean speech
162 | # recommend to re-download the data using this script
163 |
164 | AZURE_URL="https://dns4public.blob.core.windows.net/dns4archive/datasets_fullband"
165 |
166 | OUTPUT_PATH="./datasets_fullband"
167 |
168 | mkdir -p $OUTPUT_PATH/{clean_fullband,noise_fullband}
169 |
170 | for BLOB in ${BLOB_NAMES[@]}
171 | do
172 | URL="$AZURE_URL/$BLOB"
173 | echo "Download: $BLOB"
174 |
175 | # DRY RUN: print HTTP response and Content-Length
176 | # WITHOUT downloading the files
177 | curl -s -I "$URL" | head -n 2
178 |
179 | # Actually download the files: UNCOMMENT when ready to download
180 | # curl "$URL" -o "$OUTPUT_PATH/$BLOB"
181 |
182 | # Same as above, but using wget
183 | # wget "$URL" -O "$OUTPUT_PATH/$BLOB"
184 |
185 | # Same, + unpack files on the fly
186 | # curl "$URL" | tar -C "$OUTPUT_PATH" -f - -x -j
187 | done
188 |
--------------------------------------------------------------------------------
/download-dns5-blind-testset.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** BLIND Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # -------------------------------------------------------------
9 | # The directory structure of the unpacked data is:
10 |
11 | #
12 | # +-- V5_BlindTestSet
13 | # | +-- Track1_Headset ---> (enrol, noisy)
14 | # | +-- Track2_Speakerphone ---> (enrol, noisy)
15 |
16 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_BlindTestSet.zip"
17 |
18 | echo "Download: $URL"
19 | #
20 | # DRY RUN: print HTTP header WITHOUT downloading the files
21 | curl -s -I "$URL"
22 | #
23 | # Actually download the archive - UNCOMMENT it when ready to download
24 | #do
25 | wget "$URL"
26 |
27 | #done
28 | # curl "$URL" -o "$BLOB"
29 | # Same as above, but using wget
30 | #wget "$URL
31 | # Same, + unpack files on the fly
32 | # curl "$URL" | tar -f - -x -j
33 |
--------------------------------------------------------------------------------
/download-dns5-dev-testset.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** Dev Testset for 5th DNS Challenge at ICASSP 2023*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # Zip file is 2.9 GB. Unzipped data is 4GB.
9 |
10 | # -------------------------------------------------------------
11 | # The directory structure of the unpacked data is:
12 |
13 | #
14 | # +-- V5_dev_testset 64G
15 | # | +-- Track1_Headset ---> (enrol, noisy)
16 | # | +-- Track2_Speakerphone ---> (enrol, noisy)
17 |
18 | URL="https://dnschallengepublic.blob.core.windows.net/dns5archive/V5_dev_testset.zip"
19 | echo "Download: $URL"
20 | #
21 | # DRY RUN: print HTTP header WITHOUT downloading the files
22 | curl -s -I "$URL"
23 | #
24 | # Actually download the archive - UNCOMMENT it when ready to download
25 | #do
26 | wget "$URL"
27 |
28 | #done
29 | # curl "$URL" -o "$BLOB"
30 | # Same as above, but using wget
31 | #wget "$URL
32 | # Same, + unpack files on the fly
33 | # curl "$URL" | tar -f - -x -j
34 |
--------------------------------------------------------------------------------
/download_dns_v2_v3_blindset.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/bash
2 |
3 | # ***** BLIND Testset for 2nd and 3rd DNS Challenges combined with additional handpicked clips*****
4 |
5 | # NOTE: Before downloading, make sure you have enough space
6 | # on your local storage!
7 |
8 | # -------------------------------------------------------------
9 | # The directory structure of the unpacked data is:
10 |
11 | #
12 | # +-- V2_V3_Challenge_Combined_Blindset
13 | # | +-- handpicked_emotion_testclips_16k_600_withSNR ---> (600 emotional clips)
14 | # | +-- mouseclicks_testclips_withSNR_16k ---> (100 mouseclicks clips)
15 | # | +-- noisy_blind_testset_v2_challenge_withSNR_16k ---> (700 blindset clips from V2 challenge)
16 | # | +-- noisy_blind_testset_v3_challenge_withSNR_16k ---> (600 blindset clips from V3 challenge)
17 |
18 | URL="https://dnschallengepublic.blob.core.windows.net/dns3archive/V2_V3_Challenge_Combined_Blindset.zip"
19 |
20 | echo "Download: $URL"
21 | #
22 | # DRY RUN: print HTTP header WITHOUT downloading the files
23 | curl -s -I "$URL"
24 | #
25 | # Actually download the archive - UNCOMMENT it when ready to download
26 | #do
27 | wget "$URL"
28 |
29 | #done
30 | # curl "$URL" -o "$BLOB"
31 | # Same as above, but using wget
32 | #wget "$URL
33 | # Same, + unpack files on the fly
34 | # curl "$URL" | tar -f - -x -j
35 |
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | Model comparison
4 |
5 |
6 |
7 |
8 |
9 |
72 |
88 |
89 |
90 |
91 |
92 |
93 | Enter the noise type to filter on
94 |
95 |
96 |
Audio Clips
97 |
98 |
99 |
100 |
Index
101 |
Progress
25%
102 |
Clipname
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
242 |
243 |
244 |
--------------------------------------------------------------------------------
/noisyspeech_synthesizer.cfg:
--------------------------------------------------------------------------------
1 | # Configuration for generating Noisy Speech Dataset
2 |
3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz
4 | # - audioformat: default is .wav
5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances.
6 | # - silence_length: Duration of silence introduced between clean speech utterances.
7 | # - total_hours: Total number of hours of data required. Units are in hours.
8 | # - snr_lower: Lower bound for SNR required (default: 0 dB)
9 | # - snr_upper: Upper bound for SNR required (default: 40 dB)
10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB)
11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB)
12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper)
13 | # - clean_activity_threshold: Activity threshold for clean speech
14 | # - noise_activity_threshold: Activity threshold for noise
15 | # - fileindex_start: Starting file ID that will be used in filenames
16 | # - fileindex_end: Last file ID that will be used in filenames
17 | # - is_test_set: Set it to True if it is the test set, else False for the training set
18 | # - noise_dir: Specify the directory path to all noise files
19 | # - Speech_dir: Specify the directory path to all clean speech files
20 | # - noisy_destination: Specify path to the destination directory to store noisy speech
21 | # - clean_destination: Specify path to the destination directory to store clean speech
22 | # - noise_destination: Specify path to the destination directory to store noise speech
23 | # - log_dir: Specify path to the directory to store all the log files
24 |
25 | # Configuration for unit tests
26 | # - snr_test: Set to True if SNR test is required, else False
27 | # - norm_test: Set to True if Normalization test is required, else False
28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False
29 | # - clipping_test: Set to True if Clipping test is required, else False
30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs
31 |
32 | [noisy_speech]
33 |
34 | sampling_rate: 16000
35 | audioformat: *.wav
36 | audio_length: 30
37 | silence_length: 0.2
38 | total_hours: 500
39 | snr_lower: -5
40 | snr_upper: 20
41 | randomize_snr: True
42 | target_level_lower: -35
43 | target_level_upper: -15
44 | total_snrlevels: 21
45 | clean_activity_threshold: 0.6
46 | noise_activity_threshold: 0.0
47 | fileindex_start: None
48 | fileindex_end: None
49 | is_test_set: False
50 |
51 | noise_dir: datasets\noise
52 | speech_dir: datasets\clean\read_speech
53 | noise_types_excluded: None
54 |
55 | noisy_destination: datasets\training_set_sept12\noisy
56 | clean_destination: datasets\training_set_sept12\clean
57 | noise_destination: datasets\training_set_sept12\noise
58 | log_dir: logs
59 |
60 | # Config: add singing voice to clean speech
61 | use_singing_data=1
62 | # 0 for no, 1 for yes
63 | clean_singing: datasets\clean\singing_voice
64 | #datasets\clean_singing\VocalSet11\FULL
65 | singing_choice: 3
66 | # 1 for only male, 2 for only female, 3 (default) for both male and female
67 |
68 | # Config: add emotional data to clean speech
69 | use_emotion_data=1
70 | # 0 for no, 1 for yes
71 | clean_emotion: datasets\clean\emotional_speech
72 |
73 | # Config: add Chinese (mandarin) data to clean speech
74 | use_mandarin_data=1
75 | # 0 for no, 1 for yes
76 | clean_mandarin: datasets\clean\mandarin_speech
77 |
78 | # Config: add reverb to clean speech
79 | rir_choice: 3
80 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic
81 | lower_t60: 0.3
82 | # lower bound of t60 range in seconds
83 | upper_t60: 1.3
84 | # upper bound of t60 range in seconds
85 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv
86 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv
87 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR
88 |
89 | # Unit tests config
90 | snr_test: True
91 | norm_test: True
92 | sampling_rate_test = True
93 | clipping_test = True
94 |
95 | unit_tests_log_dir: unittests_logs
96 |
--------------------------------------------------------------------------------
/noisyspeech_synthesizer_singleprocess.py:
--------------------------------------------------------------------------------
1 | """
2 | @author: chkarada
3 | """
4 |
5 | # Note: This single process audio synthesizer will attempt to use each clean
6 | # speech sourcefile once, as it does not randomly sample from these files
7 |
8 | import os
9 | import sys
10 | import glob
11 | import argparse
12 | import ast
13 | import configparser as CP
14 | from random import shuffle
15 | import random
16 |
17 | import librosa
18 | import numpy as np
19 | from scipy import signal
20 | from audiolib import audioread, audiowrite, segmental_snr_mixer, activitydetector, is_clipped, add_clipping
21 | import utils
22 |
23 | import pandas as pd
24 | from pathlib import Path
25 | from scipy.io import wavfile
26 |
27 | MAXTRIES = 50
28 | MAXFILELEN = 100
29 |
30 | np.random.seed(5)
31 | random.seed(5)
32 |
33 | def add_pyreverb(clean_speech, rir):
34 |
35 | reverb_speech = signal.fftconvolve(clean_speech, rir, mode="full")
36 |
37 | # make reverb_speech same length as clean_speech
38 | reverb_speech = reverb_speech[0 : clean_speech.shape[0]]
39 |
40 | return reverb_speech
41 |
42 | def build_audio(is_clean, params, index, audio_samples_length=-1):
43 | '''Construct an audio signal from source files'''
44 |
45 | fs_output = params['fs']
46 | silence_length = params['silence_length']
47 | if audio_samples_length == -1:
48 | audio_samples_length = int(params['audio_length']*params['fs'])
49 |
50 | output_audio = np.zeros(0)
51 | remaining_length = audio_samples_length
52 | files_used = []
53 | clipped_files = []
54 |
55 | if is_clean:
56 | source_files = params['cleanfilenames']
57 | idx = index
58 | else:
59 | if 'noisefilenames' in params.keys():
60 | source_files = params['noisefilenames']
61 | idx = index
62 | # if noise files are organized into individual subdirectories, pick a directory randomly
63 | else:
64 | noisedirs = params['noisedirs']
65 | # pick a noise category randomly
66 | idx_n_dir = np.random.randint(0, np.size(noisedirs))
67 | source_files = glob.glob(os.path.join(noisedirs[idx_n_dir],
68 | params['audioformat']))
69 | shuffle(source_files)
70 | # pick a noise source file index randomly
71 | idx = np.random.randint(0, np.size(source_files))
72 |
73 | # initialize silence
74 | silence = np.zeros(int(fs_output*silence_length))
75 |
76 | # iterate through multiple clips until we have a long enough signal
77 | tries_left = MAXTRIES
78 | while remaining_length > 0 and tries_left > 0:
79 |
80 | # read next audio file and resample if necessary
81 |
82 | idx = (idx + 1) % np.size(source_files)
83 | input_audio, fs_input = audioread(source_files[idx])
84 | if input_audio is None:
85 | sys.stderr.write("WARNING: Cannot read file: %s\n" % source_files[idx])
86 | continue
87 | if fs_input != fs_output:
88 | input_audio = librosa.resample(input_audio, fs_input, fs_output)
89 |
90 | # if current file is longer than remaining desired length, and this is
91 | # noise generation or this is training set, subsample it randomly
92 | if len(input_audio) > remaining_length and (not is_clean or not params['is_test_set']):
93 | idx_seg = np.random.randint(0, len(input_audio)-remaining_length)
94 | input_audio = input_audio[idx_seg:idx_seg+remaining_length]
95 |
96 | # check for clipping, and if found move onto next file
97 | if is_clipped(input_audio):
98 | clipped_files.append(source_files[idx])
99 | tries_left -= 1
100 | continue
101 |
102 | # concatenate current input audio to output audio stream
103 | files_used.append(source_files[idx])
104 | output_audio = np.append(output_audio, input_audio)
105 | remaining_length -= len(input_audio)
106 |
107 | # add some silence if we have not reached desired audio length
108 | if remaining_length > 0:
109 | silence_len = min(remaining_length, len(silence))
110 | output_audio = np.append(output_audio, silence[:silence_len])
111 | remaining_length -= silence_len
112 |
113 | if tries_left == 0 and not is_clean and 'noisedirs' in params.keys():
114 | print("There are not enough non-clipped files in the " + noisedirs[idx_n_dir] + \
115 | " directory to complete the audio build")
116 | return [], [], clipped_files, idx
117 |
118 | return output_audio, files_used, clipped_files, idx
119 |
120 |
121 | def gen_audio(is_clean, params, index, audio_samples_length=-1):
122 | '''Calls build_audio() to get an audio signal, and verify that it meets the
123 | activity threshold'''
124 |
125 | clipped_files = []
126 | low_activity_files = []
127 | if audio_samples_length == -1:
128 | audio_samples_length = int(params['audio_length']*params['fs'])
129 | if is_clean:
130 | activity_threshold = params['clean_activity_threshold']
131 | else:
132 | activity_threshold = params['noise_activity_threshold']
133 |
134 | while True:
135 | audio, source_files, new_clipped_files, index = \
136 | build_audio(is_clean, params, index, audio_samples_length)
137 |
138 | clipped_files += new_clipped_files
139 | if len(audio) < audio_samples_length:
140 | continue
141 |
142 | if activity_threshold == 0.0:
143 | break
144 |
145 | percactive = activitydetector(audio=audio)
146 | if percactive > activity_threshold:
147 | break
148 | else:
149 | low_activity_files += source_files
150 |
151 | return audio, source_files, clipped_files, low_activity_files, index
152 |
153 |
154 | def main_gen(params):
155 | '''Calls gen_audio() to generate the audio signals, verifies that they meet
156 | the requirements, and writes the files to storage'''
157 |
158 | clean_source_files = []
159 | clean_clipped_files = []
160 | clean_low_activity_files = []
161 | noise_source_files = []
162 | noise_clipped_files = []
163 | noise_low_activity_files = []
164 |
165 | clean_index = 0
166 | noise_index = 0
167 | file_num = params['fileindex_start']
168 |
169 | while file_num <= params['fileindex_end']:
170 | # generate clean speech
171 | clean, clean_sf, clean_cf, clean_laf, clean_index = \
172 | gen_audio(True, params, clean_index)
173 |
174 | # add reverb with selected RIR
175 | rir_index = random.randint(0,len(params['myrir'])-1)
176 |
177 | my_rir = os.path.normpath(os.path.join('datasets', 'impulse_responses', params['myrir'][rir_index]))
178 | (fs_rir,samples_rir) = wavfile.read(my_rir)
179 |
180 | my_channel = int(params['mychannel'][rir_index])
181 |
182 | if samples_rir.ndim==1:
183 | samples_rir_ch = np.array(samples_rir)
184 |
185 | elif my_channel > 1:
186 | samples_rir_ch = samples_rir[:, my_channel -1]
187 | else:
188 | samples_rir_ch = samples_rir[:, my_channel -1]
189 | #print(samples_rir.shape)
190 | #print(my_channel)
191 |
192 | clean = add_pyreverb(clean, samples_rir_ch)
193 |
194 | # generate noise
195 | noise, noise_sf, noise_cf, noise_laf, noise_index = \
196 | gen_audio(False, params, noise_index, len(clean))
197 |
198 | clean_clipped_files += clean_cf
199 | clean_low_activity_files += clean_laf
200 | noise_clipped_files += noise_cf
201 | noise_low_activity_files += noise_laf
202 |
203 | # get rir files and config
204 |
205 | # mix clean speech and noise
206 | # if specified, use specified SNR value
207 | if not params['randomize_snr']:
208 | snr = params['snr']
209 | # use a randomly sampled SNR value between the specified bounds
210 | else:
211 | snr = np.random.randint(params['snr_lower'], params['snr_upper'])
212 |
213 | clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params,
214 | clean=clean,
215 | noise=noise,
216 | snr=snr)
217 | # Uncomment the below lines if you need segmental SNR and comment the above lines using snr_mixer
218 | #clean_snr, noise_snr, noisy_snr, target_level = segmental_snr_mixer(params=params,
219 | # clean=clean,
220 | # noise=noise,
221 | # snr=snr)
222 | # unexpected clipping
223 | if is_clipped(clean_snr) or is_clipped(noise_snr) or is_clipped(noisy_snr):
224 | print("Warning: File #" + str(file_num) + " has unexpected clipping, " + \
225 | "returning without writing audio to disk")
226 | continue
227 |
228 | clean_source_files += clean_sf
229 | noise_source_files += noise_sf
230 |
231 | # write resultant audio streams to files
232 | hyphen = '-'
233 | clean_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in clean_sf]
234 | clean_files_joined = hyphen.join(clean_source_filenamesonly)[:MAXFILELEN]
235 | noise_source_filenamesonly = [i[:-4].split(os.path.sep)[-1] for i in noise_sf]
236 | noise_files_joined = hyphen.join(noise_source_filenamesonly)[:MAXFILELEN]
237 |
238 | noisyfilename = clean_files_joined + '_' + noise_files_joined + '_snr' + \
239 | str(snr) + '_tl' + str(target_level) + '_fileid_' + str(file_num) + '.wav'
240 | cleanfilename = 'clean_fileid_'+str(file_num)+'.wav'
241 | noisefilename = 'noise_fileid_'+str(file_num)+'.wav'
242 |
243 | noisypath = os.path.join(params['noisyspeech_dir'], noisyfilename)
244 | cleanpath = os.path.join(params['clean_proc_dir'], cleanfilename)
245 | noisepath = os.path.join(params['noise_proc_dir'], noisefilename)
246 |
247 | audio_signals = [noisy_snr, clean_snr, noise_snr]
248 | file_paths = [noisypath, cleanpath, noisepath]
249 |
250 | file_num += 1
251 | for i in range(len(audio_signals)):
252 | try:
253 | audiowrite(file_paths[i], audio_signals[i], params['fs'])
254 | except Exception as e:
255 | print(str(e))
256 |
257 |
258 | return clean_source_files, clean_clipped_files, clean_low_activity_files, \
259 | noise_source_files, noise_clipped_files, noise_low_activity_files
260 |
261 |
262 | def main_body():
263 | '''Main body of this file'''
264 |
265 | parser = argparse.ArgumentParser()
266 |
267 | # Configurations: read noisyspeech_synthesizer.cfg and gather inputs
268 | parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg',
269 | help='Read noisyspeech_synthesizer.cfg for all the details')
270 | parser.add_argument('--cfg_str', type=str, default='noisy_speech')
271 | args = parser.parse_args()
272 |
273 | params = dict()
274 | params['args'] = args
275 | cfgpath = os.path.join(os.path.dirname(__file__), args.cfg)
276 | assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]'
277 |
278 | cfg = CP.ConfigParser()
279 | cfg._interpolation = CP.ExtendedInterpolation()
280 | cfg.read(cfgpath)
281 | params['cfg'] = cfg._sections[args.cfg_str]
282 | cfg = params['cfg']
283 |
284 | clean_dir = os.path.join(os.path.dirname(__file__), 'datasets/clean')
285 |
286 | if cfg['speech_dir'] != 'None':
287 | clean_dir = cfg['speech_dir']
288 | if not os.path.exists(clean_dir):
289 | assert False, ('Clean speech data is required')
290 |
291 | noise_dir = os.path.join(os.path.dirname(__file__), 'datasets/noise')
292 |
293 | if cfg['noise_dir'] != 'None':
294 | noise_dir = cfg['noise_dir']
295 | if not os.path.exists:
296 | assert False, ('Noise data is required')
297 |
298 | params['fs'] = int(cfg['sampling_rate'])
299 | params['audioformat'] = cfg['audioformat']
300 | params['audio_length'] = float(cfg['audio_length'])
301 | params['silence_length'] = float(cfg['silence_length'])
302 | params['total_hours'] = float(cfg['total_hours'])
303 |
304 | # clean singing speech
305 | params['use_singing_data'] = int(cfg['use_singing_data'])
306 | params['clean_singing'] = str(cfg['clean_singing'])
307 | params['singing_choice'] = int(cfg['singing_choice'])
308 |
309 | # clean emotional speech
310 | params['use_emotion_data'] = int(cfg['use_emotion_data'])
311 | params['clean_emotion'] = str(cfg['clean_emotion'])
312 |
313 | # clean mandarin speech
314 | params['use_mandarin_data'] = int(cfg['use_mandarin_data'])
315 | params['clean_mandarin'] = str(cfg['clean_mandarin'])
316 |
317 | # rir
318 | params['rir_choice'] = int(cfg['rir_choice'])
319 | params['lower_t60'] = float(cfg['lower_t60'])
320 | params['upper_t60'] = float(cfg['upper_t60'])
321 | params['rir_table_csv'] = str(cfg['rir_table_csv'])
322 | params['clean_speech_t60_csv'] = str(cfg['clean_speech_t60_csv'])
323 |
324 | if cfg['fileindex_start'] != 'None' and cfg['fileindex_end'] != 'None':
325 | params['num_files'] = int(cfg['fileindex_end'])-int(cfg['fileindex_start'])
326 | params['fileindex_start'] = int(cfg['fileindex_start'])
327 | params['fileindex_end'] = int(cfg['fileindex_end'])
328 | else:
329 | params['num_files'] = int((params['total_hours']*60*60)/params['audio_length'])
330 | params['fileindex_start'] = 0
331 | params['fileindex_end'] = params['num_files']
332 |
333 | print('Number of files to be synthesized:', params['num_files'])
334 |
335 | params['is_test_set'] = utils.str2bool(cfg['is_test_set'])
336 | params['clean_activity_threshold'] = float(cfg['clean_activity_threshold'])
337 | params['noise_activity_threshold'] = float(cfg['noise_activity_threshold'])
338 | params['snr_lower'] = int(cfg['snr_lower'])
339 | params['snr_upper'] = int(cfg['snr_upper'])
340 |
341 | params['randomize_snr'] = utils.str2bool(cfg['randomize_snr'])
342 | params['target_level_lower'] = int(cfg['target_level_lower'])
343 | params['target_level_upper'] = int(cfg['target_level_upper'])
344 |
345 | if 'snr' in cfg.keys():
346 | params['snr'] = int(cfg['snr'])
347 | else:
348 | params['snr'] = int((params['snr_lower'] + params['snr_upper'])/2)
349 |
350 | params['noisyspeech_dir'] = utils.get_dir(cfg, 'noisy_destination', 'noisy')
351 | params['clean_proc_dir'] = utils.get_dir(cfg, 'clean_destination', 'clean')
352 | params['noise_proc_dir'] = utils.get_dir(cfg, 'noise_destination', 'noise')
353 |
354 | if 'speech_csv' in cfg.keys() and cfg['speech_csv'] != 'None':
355 | cleanfilenames = pd.read_csv(cfg['speech_csv'])
356 | cleanfilenames = cleanfilenames['filename']
357 | else:
358 | #cleanfilenames = glob.glob(os.path.join(clean_dir, params['audioformat']))
359 | cleanfilenames= []
360 | for path in Path(clean_dir).rglob('*.wav'):
361 | cleanfilenames.append(str(path.resolve()))
362 |
363 | shuffle(cleanfilenames)
364 | # add singing voice to clean speech
365 | if params['use_singing_data'] ==1:
366 | all_singing= []
367 | for path in Path(params['clean_singing']).rglob('*.wav'):
368 | all_singing.append(str(path.resolve()))
369 |
370 | if params['singing_choice']==1: # male speakers
371 | mysinging = [s for s in all_singing if ("male" in s and "female" not in s)]
372 |
373 | elif params['singing_choice']==2: # female speakers
374 | mysinging = [s for s in all_singing if "female" in s]
375 |
376 | elif params['singing_choice']==3: # both male and female
377 | mysinging = all_singing
378 | else: # default both male and female
379 | mysinging = all_singing
380 |
381 | shuffle(mysinging)
382 | if mysinging is not None:
383 | all_cleanfiles= cleanfilenames + mysinging
384 | else:
385 | all_cleanfiles= cleanfilenames
386 |
387 | # add emotion data to clean speech
388 | if params['use_emotion_data'] ==1:
389 | all_emotion= []
390 | for path in Path(params['clean_emotion']).rglob('*.wav'):
391 | all_emotion.append(str(path.resolve()))
392 |
393 | shuffle(all_emotion)
394 | if all_emotion is not None:
395 | all_cleanfiles = all_cleanfiles + all_emotion
396 | else:
397 | print('NOT using emotion data for training!')
398 |
399 | # add mandarin data to clean speech
400 | if params['use_mandarin_data'] ==1:
401 | all_mandarin= []
402 | for path in Path(params['clean_mandarin']).rglob('*.wav'):
403 | all_mandarin.append(str(path.resolve()))
404 |
405 | shuffle(all_mandarin)
406 | if all_mandarin is not None:
407 | all_cleanfiles = all_cleanfiles + all_mandarin
408 | else:
409 | print('NOT using non-english (Mandarin) data for training!')
410 |
411 |
412 | params['cleanfilenames'] = all_cleanfiles
413 | params['num_cleanfiles'] = len(params['cleanfilenames'])
414 | # If there are .wav files in noise_dir directory, use those
415 | # If not, that implies that the noise files are organized into subdirectories by type,
416 | # so get the names of the non-excluded subdirectories
417 | if 'noise_csv' in cfg.keys() and cfg['noise_csv'] != 'None':
418 | noisefilenames = pd.read_csv(cfg['noise_csv'])
419 | noisefilenames = noisefilenames['filename']
420 | else:
421 | noisefilenames = glob.glob(os.path.join(noise_dir, params['audioformat']))
422 |
423 | if len(noisefilenames)!=0:
424 | shuffle(noisefilenames)
425 | params['noisefilenames'] = noisefilenames
426 | else:
427 | noisedirs = glob.glob(os.path.join(noise_dir, '*'))
428 | if cfg['noise_types_excluded'] != 'None':
429 | dirstoexclude = cfg['noise_types_excluded'].split(',')
430 | for dirs in dirstoexclude:
431 | noisedirs.remove(dirs)
432 | shuffle(noisedirs)
433 | params['noisedirs'] = noisedirs
434 |
435 | # rir
436 | temp = pd.read_csv(params['rir_table_csv'], skiprows=[1], sep=',', header=None, names=['wavfile','channel','T60_WB','C50_WB','isRealRIR'])
437 | temp.keys()
438 | #temp.wavfile
439 |
440 | rir_wav = temp['wavfile'][1:] # 115413
441 | rir_channel = temp['channel'][1:]
442 | rir_t60 = temp['T60_WB'][1:]
443 | rir_isreal= temp['isRealRIR'][1:]
444 |
445 | rir_wav2 = [w.replace('\\', '/') for w in rir_wav]
446 | rir_channel2 = [w for w in rir_channel]
447 | rir_t60_2 = [w for w in rir_t60]
448 | rir_isreal2= [w for w in rir_isreal]
449 |
450 | myrir =[]
451 | mychannel=[]
452 | myt60=[]
453 |
454 | lower_t60= params['lower_t60']
455 | upper_t60= params['upper_t60']
456 |
457 | if params['rir_choice']==1: # real 3076 IRs
458 | real_indices= [i for i, x in enumerate(rir_isreal2) if x == "1"]
459 |
460 | chosen_i = []
461 | for i in real_indices:
462 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
463 | chosen_i.append(i)
464 |
465 | myrir= [rir_wav2[i] for i in chosen_i]
466 | mychannel = [rir_channel2[i] for i in chosen_i]
467 | myt60 = [rir_t60_2[i] for i in chosen_i]
468 |
469 |
470 | elif params['rir_choice']==2: # synthetic 112337 IRs
471 | synthetic_indices= [i for i, x in enumerate(rir_isreal2) if x == "0"]
472 |
473 | chosen_i = []
474 | for i in synthetic_indices:
475 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
476 | chosen_i.append(i)
477 |
478 | myrir= [rir_wav2[i] for i in chosen_i]
479 | mychannel = [rir_channel2[i] for i in chosen_i]
480 | myt60 = [rir_t60_2[i] for i in chosen_i]
481 |
482 | elif params['rir_choice']==3: # both real and synthetic
483 | all_indices= [i for i, x in enumerate(rir_isreal2)]
484 |
485 | chosen_i = []
486 | for i in all_indices:
487 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
488 | chosen_i.append(i)
489 |
490 | myrir= [rir_wav2[i] for i in chosen_i]
491 | mychannel = [rir_channel2[i] for i in chosen_i]
492 | myt60 = [rir_t60_2[i] for i in chosen_i]
493 |
494 | else: # default both real and synthetic
495 | all_indices= [i for i, x in enumerate(rir_isreal2)]
496 |
497 | chosen_i = []
498 | for i in all_indices:
499 | if (float(rir_t60_2[i]) >= lower_t60) and (float(rir_t60_2[i]) <= upper_t60):
500 | chosen_i.append(i)
501 |
502 | myrir= [rir_wav2[i] for i in chosen_i]
503 | mychannel = [rir_channel2[i] for i in chosen_i]
504 | myt60 = [rir_t60_2[i] for i in chosen_i]
505 |
506 | params['myrir'] = myrir
507 | params['mychannel'] = mychannel
508 | params['myt60'] = myt60
509 |
510 | # Call main_gen() to generate audio
511 | clean_source_files, clean_clipped_files, clean_low_activity_files, \
512 | noise_source_files, noise_clipped_files, noise_low_activity_files = main_gen(params)
513 |
514 | # Create log directory if needed, and write log files of clipped and low activity files
515 | log_dir = utils.get_dir(cfg, 'log_dir', 'Logs')
516 |
517 | utils.write_log_file(log_dir, 'source_files.csv', clean_source_files + noise_source_files)
518 | utils.write_log_file(log_dir, 'clipped_files.csv', clean_clipped_files + noise_clipped_files)
519 | utils.write_log_file(log_dir, 'low_activity_files.csv', \
520 | clean_low_activity_files + noise_low_activity_files)
521 |
522 | # Compute and print stats about percentange of clipped and low activity files
523 | total_clean = len(clean_source_files) + len(clean_clipped_files) + len(clean_low_activity_files)
524 | total_noise = len(noise_source_files) + len(noise_clipped_files) + len(noise_low_activity_files)
525 | pct_clean_clipped = round(len(clean_clipped_files)/total_clean*100, 1)
526 | pct_noise_clipped = round(len(noise_clipped_files)/total_noise*100, 1)
527 | pct_clean_low_activity = round(len(clean_low_activity_files)/total_clean*100, 1)
528 | pct_noise_low_activity = round(len(noise_low_activity_files)/total_noise*100, 1)
529 |
530 | print("Of the " + str(total_clean) + " clean speech files analyzed, " + \
531 | str(pct_clean_clipped) + "% had clipping, and " + str(pct_clean_low_activity) + \
532 | "% had low activity " + "(below " + str(params['clean_activity_threshold']*100) + \
533 | "% active percentage)")
534 | print("Of the " + str(total_noise) + " noise files analyzed, " + str(pct_noise_clipped) + \
535 | "% had clipping, and " + str(pct_noise_low_activity) + "% had low activity " + \
536 | "(below " + str(params['noise_activity_threshold']*100) + "% active percentage)")
537 |
538 |
539 | if __name__ == '__main__':
540 |
541 | main_body()
542 |
--------------------------------------------------------------------------------
/pdns_synthesizer_icassp2023.cfg:
--------------------------------------------------------------------------------
1 | # Configuration for generating Noisy Speech Dataset
2 |
3 | # - sampling_rate: Specify the sampling rate. Default is 16 kHz
4 | # - audioformat: default is .wav
5 | # - audio_length: Minimum Length of each audio clip (noisy and clean speech) in seconds that will be generated by augmenting utterances.
6 | # - silence_length: Duration of silence introduced between clean speech utterances.
7 | # - total_hours: Total number of hours of data required. Units are in hours.
8 | # - snr_lower: Lower bound for SNR required (default: 0 dB)
9 | # - snr_upper: Upper bound for SNR required (default: 40 dB)
10 | # - target_level_lower: Lower bound for the target audio level before audiowrite (default: -35 dB)
11 | # - target_level_upper: Upper bound for the target audio level before audiowrite (default: -15 dB)
12 | # - total_snrlevels: Number of SNR levels required (default: 5, which means there are 5 levels between snr_lower and snr_upper)
13 | # - clean_activity_threshold: Activity threshold for clean speech
14 | # - noise_activity_threshold: Activity threshold for noise
15 | # - fileindex_start: Starting file ID that will be used in filenames
16 | # - fileindex_end: Last file ID that will be used in filenames
17 | # - is_test_set: Set it to True if it is the test set, else False for the training set
18 | # - noise_dir: Specify the directory path to all noise files
19 | # - Speech_dir: Specify the directory path to all clean speech files
20 | # - noisy_destination: Specify path to the destination directory to store noisy speech
21 | # - clean_destination: Specify path to the destination directory to store clean speech
22 | # - noise_destination: Specify path to the destination directory to store noise speech
23 | # - log_dir: Specify path to the directory to store all the log files
24 |
25 | # Configuration for unit tests
26 | # - snr_test: Set to True if SNR test is required, else False
27 | # - norm_test: Set to True if Normalization test is required, else False
28 | # - sampling_rate_test: Set to True if Sampling Rate test is required, else False
29 | # - clipping_test: Set to True if Clipping test is required, else False
30 | # - unit_tests_log_dir: Specify path to the directory where you want to store logs
31 |
32 | [noisy_speech]
33 |
34 | sampling_rate: 48000
35 | audioformat: *.wav
36 | audio_length: 30
37 | # 15, 12, 30
38 | silence_length: 0.2
39 | total_hours: 1000
40 | # 1000
41 | #200
42 | # 2.5, 500, 100
43 | snr_lower: -5
44 | #-5, 0
45 | snr_upper: 20
46 | # 25, 40
47 | randomize_snr: True
48 | target_level_lower: -35
49 | target_level_upper: -15
50 | total_snrlevels: 31
51 | # 5
52 | clean_activity_threshold: 0.0
53 | noise_activity_threshold: 0.2
54 | fileindex_start: None
55 | fileindex_end: None
56 | is_test_set: False
57 | # True, False
58 |
59 | noise_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/noise
60 | #/mnt/f/4th_DNSChallenge/INTERSPEECH_2021/DNS-Challenge/datasets_fullband/noise
61 | #F:\4th_DNSChallenge\INTERSPEECH_2021\DNS-Challenge\datasets_fullband\noise
62 | #datasets\pdns_training_set\noise
63 | #\test_set2\Test_Noise
64 | # datasets\noise
65 | # \datasets\noise
66 |
67 | speech_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean
68 | # D:\kanhawin_git\primary_speakers_VCTK_16k_for_synthesizer
69 | # datasets\test_set2\Singing_Voice\wav_16k
70 | # dir with secondary speaker clean speech
71 | speech_dir2: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/clean
72 | #D:\kanhawin_git\secondary_speakers_voxCeleb2_16k
73 | # datasets\test_set2\Singing_Voice\wav_16k
74 |
75 | spkid_csv: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/complete_ps_split.csv
76 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/filelists/vctk_spkid.csv
77 | # datasets\clean
78 | noise_types_excluded: None
79 |
80 | rir_dir: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/impulse_responses
81 | #/mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/impulse_responses
82 | # F:\4th_DNSChallenge\ICASSP_2022\DNS-Challenge\datasets\impulse_responses
83 |
84 | # \datasets\clean
85 | noisy_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noisy
86 | # datasets/training_data/noisy
87 | # datasets\test_set2\synthetic_personalizeddns\noisy
88 | #training_set2_onlyrealrir\noisy
89 | #\noisy
90 | clean_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/clean
91 | #datasets\test_set2\synthetic_personalizeddns\clean
92 | # training_set2_onlyrealrir\clean
93 | # \clean
94 | noise_destination: /mnt/f/4th_DNSChallenge/ICASSP_2022/DNS-Challenge/datasets/pdns_training_set/mixed/noise
95 | # datasets/training_data/noise
96 | #datasets\test_set2\synthetic_personalizeddns\noise
97 | #training_set2_onlyrealrir\noise
98 | # \noise
99 | log_dir: logs
100 | # \logs
101 |
102 | # Config: add singing voice to clean speech
103 | clean_singing: datasets\clean_singing\VocalSet11\FULL
104 | singing_choice: 3
105 | # 1 for only male, 2 for only female, 3 (default) for both male and female
106 |
107 | # Config: add reverb to clean speech
108 | rir_choice: 1
109 | # 1 for only real rir, 2 for only synthetic rir, 3 (default) use both real and synthetic
110 | lower_t60: 0.3
111 | # lower bound of t60 range in seconds
112 | upper_t60: 1.3
113 | # upper bound of t60 range in seconds
114 | rir_table_csv: datasets\acoustic_params\RIR_table_simple.csv
115 | clean_speech_t60_csv: datasets\acoustic_params\cleanspeech_table_t60_c50.csv
116 | # percent_for_adding_reverb=0.5 # percentage of clean speech convolved with RIR
117 |
118 | # pdns testsets
119 | # primary_data: D:\kanhawin_git\primary_speakers_VCTK_16k
120 | #'D:\PersonalizedDNS_dataset\synthetic_primary'
121 | # secondary_data='D:\kanhawin_git\secondary_speakers_voxCeleb2_16k'
122 | #'D:\PersonalizedDNS_dataset\synthetic_secondary'
123 | # noise_data= datasets\test_set2\synthetic\noise
124 | # pdns_testset_clean= datasets\test_set2\pdns\clean
125 | # pdns_testset_noisy= datasets\test_set2\pdns\noisy
126 |
127 | # adaptation_data_seconds=120
128 | # num_primary_spk=100
129 | # num_clips=600
130 |
131 | # Unit tests config
132 | snr_test: True
133 | norm_test: True
134 | sampling_rate_test = True
135 | clipping_test = True
136 |
137 | unit_tests_log_dir: unittests_logs
138 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.22.4
2 | soundfile==0.9.0
3 | librosa==0.8.1
4 | argparse==1.1
5 | configparser==5.3.0
6 | pandas==1.2.4
7 | logging==0.4.9.6
8 | onnxruntime==1.13.1
9 | torch==1.10.0
10 | torchvision==0.11.1
11 | torchaudio==0.10.0
12 |
--------------------------------------------------------------------------------
/unit_tests_synthesizer.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import soundfile as sf
3 | import glob
4 | import argparse
5 | import os
6 | import utils
7 | import configparser as CP
8 |
9 | LOW_ENERGY_THRESH = -60
10 |
11 | def test_snr(clean, noise, expected_snr, snrtolerance=2):
12 | '''Test for SNR
13 | Note: It is not applicable for Segmental SNR'''
14 | rmsclean = (clean**2).mean()**0.5
15 | rmsnoise = (noise**2).mean()**0.5
16 | actual_snr = 20*np.log10(rmsclean/rmsnoise)
17 | return actual_snr > (expected_snr-snrtolerance) and actual_snr < (expected_snr+snrtolerance)
18 |
19 | def test_normalization(audio, expected_rms=-25, normtolerance=2):
20 | '''Test for Normalization
21 | Note: Set it to False if different target levels are used'''
22 | rmsaudio = (audio**2).mean()**0.5
23 | rmsaudiodb = 20*np.log10(rmsaudio)
24 | return rmsaudiodb > (expected_rms-normtolerance) and rmsaudiodb < (expected_rms+normtolerance)
25 |
26 | def test_samplingrate(sr, expected_sr=16000):
27 | '''Test to ensure all clips have same sampling rate'''
28 | return expected_sr == sr
29 |
30 | def test_clipping(audio, num_consecutive_samples=3, clipping_threshold=0.01):
31 | '''Test to detect clipping'''
32 | clipping = False
33 | for i in range(0, len(audio)-num_consecutive_samples-1):
34 | audioseg = audio[i:i+num_consecutive_samples]
35 | if abs(max(audioseg)-min(audioseg)) < clipping_threshold or abs(max(audioseg)) >= 1:
36 | clipping = True
37 | break
38 | return clipping
39 |
40 | def test_zeros_beg_end(audio, num_zeros=16000, low_energy_thresh=LOW_ENERGY_THRESH):
41 | '''Test if there are zeros in the beginning and the end of the signal'''
42 | beg_segment_energy = 20*np.log10(audio[:num_zeros]**2).mean()**0.5
43 | end_segment_energy = 20*np.log10(audio[-num_zeros:]**2).mean()**0.5
44 | return beg_segment_energy < low_energy_thresh or end_segment_energy < low_energy_thresh
45 |
46 | def adsp_filtering_test(adsp, without_adsp):
47 | diff = adsp - without_adsp
48 | if any(val >0.0001 for val in diff):
49 |
50 |
51 | if __name__=='__main__':
52 | parser = argparse.ArgumentParser()
53 | parser.add_argument('--cfg', default='noisyspeech_synthesizer.cfg')
54 | parser.add_argument('--cfg_str', type=str, default='noisy_speech')
55 |
56 | args = parser.parse_args()
57 |
58 | cfgpath = os.path.join(os.path.dirname(__file__), args.cfg)
59 | assert os.path.exists(cfgpath), f'No configuration file as [{cfgpath}]'
60 |
61 | cfg = CP.ConfigParser()
62 | cfg._interpolation = CP.ExtendedInterpolation()
63 | cfg.read(cfgpath)
64 | cfg = cfg._sections[args.cfg_str]
65 |
66 | noisydir = cfg['noisy_train']
67 | cleandir = cfg['clean_train']
68 | noisedir = cfg['noise_train']
69 | audioformat = cfg['audioformat']
70 |
71 | # List of noisy speech files
72 | noisy_speech_filenames_big = glob.glob(os.path.join(noisydir, audioformat))
73 | noisy_speech_filenames = noisy_speech_filenames_big[0:10]
74 | # Initialize the lists
75 | noisy_filenames_list = []
76 | clean_filenames_list = []
77 | noise_filenames_list = []
78 | snr_results_list =[]
79 | clean_norm_results_list = []
80 | noise_norm_results_list = []
81 | noisy_norm_results_list = []
82 | clean_sr_results_list = []
83 | noise_sr_results_list = []
84 | noisy_sr_results_list = []
85 | clean_clipping_results_list = []
86 | noise_clipping_results_list = []
87 | noisy_clipping_results_list = []
88 |
89 | skipped_string = 'Skipped'
90 | # Initialize the counters for stats
91 | total_clips = len(noisy_speech_filenames)
92 |
93 |
94 | for noisypath in noisy_speech_filenames:
95 | # To do: add right paths to clean filename and noise filename
96 | noisy_filename = os.path.basename(noisypath)
97 | clean_filename = 'clean_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav'
98 | cleanpath = os.path.join(cleandir, clean_filename)
99 | noise_filename = 'noise_fileid_'+os.path.splitext(noisy_filename)[0].split('fileid_')[1]+'.wav'
100 | noisepath = os.path.join(noisedir, noise_filename)
101 |
102 | noisy_filenames_list.append(noisy_filename)
103 | clean_filenames_list.append(clean_filename)
104 | noise_filenames_list.append(noise_filename)
105 |
106 | # Read clean, noise and noisy signals
107 | clean_signal, fs_clean = sf.read(cleanpath)
108 | noise_signal, fs_noise = sf.read(noisepath)
109 | noisy_signal, fs_noisy = sf.read(noisypath)
110 |
111 | # SNR Test
112 | # To do: add right path split to extract SNR
113 | if utils.str2bool(cfg['snr_test']):
114 | snr = int(noisy_filename.split('_snr')[1].split('_')[0])
115 | snr_results_list.append(str(test_snr(clean=clean_signal, \
116 | noise=noise_signal, expected_snr=snr)))
117 | else:
118 | snr_results_list.append(skipped_string)
119 |
120 | # Normalization test
121 | if utils.str2bool(cfg['norm_test']):
122 | tl = int(noisy_filename.split('_tl')[1].split('_')[0])
123 | clean_norm_results_list.append(str(test_normalization(clean_signal)))
124 | noise_norm_results_list.append(str(test_normalization(noise_signal)))
125 | noisy_norm_results_list.append(str(test_normalization(noisy_signal, expected_rms=tl)))
126 | else:
127 | clean_norm_results_list.append(skipped_string)
128 | noise_norm_results_list.append(skipped_string)
129 | noisy_norm_results_list.append(skipped_string)
130 |
131 | # Sampling rate test
132 | if utils.str2bool(cfg['sampling_rate_test']):
133 | clean_sr_results_list.append(str(test_samplingrate(sr=fs_clean)))
134 | noise_sr_results_list.append(str(test_samplingrate(sr=fs_noise)))
135 | noisy_sr_results_list.append(str(test_samplingrate(sr=fs_noisy)))
136 | else:
137 | clean_sr_results_list.append(skipped_string)
138 | noise_sr_results_list.append(skipped_string)
139 | noisy_sr_results_list.append(skipped_string)
140 |
141 | # Clipping test
142 | if utils.str2bool(cfg['clipping_test']):
143 | clean_clipping_results_list.append(str(test_clipping(audio=clean_signal)))
144 | noise_clipping_results_list.append(str(test_clipping(audio=noise_signal)))
145 | noisy_clipping_results_list.append(str(test_clipping(audio=noisy_signal)))
146 | else:
147 | clean_clipping_results_list.append(skipped_string)
148 | noise_clipping_results_list.append(skipped_string)
149 | noisy_clipping_results_list.append(skipped_string)
150 |
151 | # Stats
152 | pc_snr_passed = round(snr_results_list.count('True')/total_clips*100, 1)
153 | pc_clean_norm_passed = round(clean_norm_results_list.count('True')/total_clips*100, 1)
154 | pc_noise_norm_passed = round(noise_norm_results_list.count('True')/total_clips*100, 1)
155 | pc_noisy_norm_passed = round(noisy_norm_results_list.count('True')/total_clips*100, 1)
156 | pc_clean_sr_passed = round(clean_sr_results_list.count('True')/total_clips*100, 1)
157 | pc_noise_sr_passed = round(noise_sr_results_list.count('True')/total_clips*100, 1)
158 | pc_noisy_sr_passed = round(noisy_sr_results_list.count('True')/total_clips*100, 1)
159 | pc_clean_clipping_passed = round(clean_clipping_results_list.count('True')/total_clips*100, 1)
160 | pc_noise_clipping_passed = round(noise_clipping_results_list.count('True')/total_clips*100, 1)
161 | pc_noisy_clipping_passed = round(noisy_clipping_results_list.count('True')/total_clips*100, 1)
162 |
163 | print('% clips that passed SNR test:', pc_snr_passed)
164 |
165 | print('% clean clips that passed Normalization tests:', pc_clean_norm_passed)
166 | print('% noise clips that passed Normalization tests:', pc_noise_norm_passed)
167 | print('% noisy clips that passed Normalization tests:', pc_noisy_norm_passed)
168 |
169 | print('% clean clips that passed Sampling Rate tests:', pc_clean_sr_passed)
170 | print('% noise clips that passed Sampling Rate tests:', pc_noise_sr_passed)
171 | print('% noisy clips that passed Sampling Rate tests:', pc_noisy_sr_passed)
172 |
173 | print('% clean clips that passed Clipping tests:', pc_clean_clipping_passed)
174 | print('% noise clips that passed Clipping tests:', pc_noise_clipping_passed)
175 | print('% noisy clips that passed Clipping tests:', pc_noisy_clipping_passed)
176 |
177 | log_dir = utils.get_dir(cfg, 'unit_tests_log_dir', 'Unit_tests_logs')
178 |
179 | if not os.path.exists(log_dir):
180 | log_dir = os.path.join(os.path.dirname(__file__), 'Unit_tests_logs')
181 | os.makedirs(log_dir)
182 |
183 | utils.write_log_file(log_dir, 'unit_test_results.csv', [noisy_filenames_list, clean_filenames_list, \
184 | noise_filenames_list, snr_results_list, clean_norm_results_list, noise_norm_results_list, \
185 | noisy_norm_results_list, clean_sr_results_list, noise_sr_results_list, noisy_sr_results_list, \
186 | clean_clipping_results_list, noise_clipping_results_list, noisy_clipping_results_list])
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | Created on Fri Nov 1 10:28:41 2019
4 |
5 | @author: rocheng
6 | """
7 | import os
8 | import csv
9 | from shutil import copyfile
10 | import glob
11 |
12 | def get_dir(cfg, param_name, new_dir_name):
13 | '''Helper function to retrieve directory name if it exists,
14 | create it if it doesn't exist'''
15 |
16 | if param_name in cfg:
17 | dir_name = cfg[param_name]
18 | else:
19 | dir_name = os.path.join(os.path.dirname(__file__), new_dir_name)
20 | if not os.path.exists(dir_name):
21 | os.makedirs(dir_name)
22 | return dir_name
23 |
24 |
25 | def write_log_file(log_dir, log_filename, data):
26 | '''Helper function to write log file'''
27 | data = zip(*data)
28 | with open(os.path.join(log_dir, log_filename), mode='w', newline='') as csvfile:
29 | csvwriter = csv.writer(csvfile, delimiter=' ',
30 | quotechar='|', quoting=csv.QUOTE_MINIMAL)
31 | for row in data:
32 | csvwriter.writerow([row])
33 |
34 |
35 | def str2bool(string):
36 | return string.lower() in ("yes", "true", "t", "1")
37 |
38 |
39 | def rename_copyfile(src_path, dest_dir, prefix='', ext='*.wav'):
40 | srcfiles = glob.glob(f"{src_path}/"+ext)
41 | for i in range(len(srcfiles)):
42 | dest_path = os.path.join(dest_dir, prefix+'_'+os.path.basename(srcfiles[i]))
43 | copyfile(srcfiles[i], dest_path)
44 |
45 |
46 |
47 |
--------------------------------------------------------------------------------