├── .gitignore ├── LICENSE ├── README.md ├── example ├── MASG_Microphone_Array_Speech_Generator_in_room_acoustic.ipynb ├── babble.wav ├── male.wav ├── mic_ch0_clean.wav ├── mic_ch0_clean_noise.wav ├── mic_ch0_clean_rever.wav ├── mic_ch0_clean_rever_noise.wav ├── mic_ch0_noise.wav ├── mic_ch1_clean.wav ├── mic_ch1_clean_noise.wav ├── mic_ch1_clean_rever.wav ├── mic_ch1_clean_rever_noise.wav ├── mic_ch1_noise.wav ├── mic_ch2_clean.wav ├── mic_ch2_clean_noise.wav ├── mic_ch2_clean_rever.wav ├── mic_ch2_clean_rever_noise.wav ├── mic_ch2_noise.wav ├── mic_ch3_clean.wav ├── mic_ch3_clean_noise.wav ├── mic_ch3_clean_rever.wav ├── mic_ch3_clean_rever_noise.wav └── mic_ch3_noise.wav ├── img ├── logo.png ├── method.png ├── room.png └── room_model.png ├── microphone_array_speech_generator ├── README.md ├── add_noise_for_multichannel.py ├── microphone_array_speech_generator_for_test_dataset.py ├── microphone_array_speech_generator_for_train_dataset.py └── speech_connection.py └── pdf ├── MASG.pdf └── Pyroomacoustics.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | MANIFEST 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | db.sqlite3 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # Environments 85 | .env 86 | .venv 87 | env/ 88 | venv/ 89 | ENV/ 90 | env.bak/ 91 | venv.bak/ 92 | 93 | # Spyder project settings 94 | .spyderproject 95 | .spyproject 96 | 97 | # Rope project settings 98 | .ropeproject 99 | 100 | # mkdocs documentation 101 | /site 102 | 103 | # mypy 104 | .mypy_cache/ 105 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 vipchengrui 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![image_logo](https://github.com/vipchengrui/MASG/blob/master/img/logo.png) 2 | 3 | # MASG 4 | 5 | [![GitHub release](https://img.shields.io/github/release/vipchengrui/MASG/all.svg?style=flat-square)](https://github.com/vipchengrui/MASG/releases) 6 | [![license](https://img.shields.io/github/license/vipchengrui/MASG.svg?style=flat-square)](https://github.com/vipchengrui/MASG/blob/master/LICENSE) 7 | 8 | 9 | microphone array speech generator (MASG) in room acoustic. 10 | 11 | ## Abstract 12 | It is used to simulate the speech data received by microphone array of various shapes in room acoustic environment, including clean speech (clean), reverberation speech (clean rever), noisy speech (clean noise), noisy and reverberation speech (clean rever niose) and corresponding noise signal (noise). 13 | 14 | ## Method 15 | 16 | The MASG is implemented based on two tools, namely, *Pyroomacoustic* [1] and an improved version *add_noise_for_multichannel* with add noise [2]. The schematic diagram of MASG is shown in Fig. 1. 17 | 18 | ![image_method](https://github.com/vipchengrui/MASG/blob/master/img/method.png) 19 | 20 | *Fig. 1 The schematic diagram of MASG.* 21 | 22 | Based on *Pyroomacoustic*, the microphone array clean speech is obtained by setting the absorption to 1.0, and the microphone array reverberation speech is obtained by setting the absorption to less than 1.0. With the microphone array clean speech, combined with the noise signal and the expected signal-to-noise ratio (SNR), we can get the corresponding microphone array noise signal, and combine them with the microphone array clean speech and the microphone array reverberation speech to get the microphone array noisy speech and the microphone array noisy reverberation speech. 23 | 24 | From this, we can get the simulation data of all microphone arrays used in indoor acoustic environment. 25 | 26 | ## Simulation Environment 27 | 28 | In order to verify the effect of the MASG, we set up a common room acoustic environment in our life, meeting room scene. This scenario is shown in Fig. 2. 29 | 30 | ![image_room](https://github.com/vipchengrui/MASG/blob/master/img/room.png) 31 | 32 | *Fig. 2 Meeting room acoustic environment.* 33 | 34 | The scene simulates a meeting room with a length of 4m, a width of 3m and a height of 3m. In this room, a 2.2mx1.1mx0.75m conference table, 19 chairs with possible target sound source, and an audible screen are respectively placed. Their coordinates and details are shown in Fig. 2. 35 | 36 | Based on such a meeting room environment, we abstract the room, microphone array, target source and other information used to make the data set, and get the simulation environment as shown in Fig. 3. 37 | 38 | ![image_room_model](https://github.com/vipchengrui/MASG/blob/master/img/room_model.png) 39 | 40 | *Fig. 3 The simulation environment.* 41 | 42 | ## Program List 43 | 44 | The MASG is implemented with Python. The detailed packages and functions are as follows. 45 | 46 | ### Packages 47 | 48 | [numpy] 49 | https://numpy.org/ 50 | https://pypi.org/project/numpy/ 51 | 52 | [matplotlib] 53 | https://matplotlib.org/ 54 | https://pypi.org/project/matplotlib/ 55 | 56 | [scipy] 57 | https://www.scipy.org/ 58 | https://pypi.org/project/scipy/ 59 | 60 | [pyroomacoustic] 61 | https://github.com/LCAV/pyroomacoustics 62 | https://pypi.org/project/pyroomacoustics/ 63 | 64 | ### Functions 65 | 66 | [add_noise_for_multichannel.py] 67 | This function is used to add noise to microphone array clean speech and microphone array reverberation speech based on the expected SNR. 68 | 69 | [microphone_array_speech_generator_for_test_dataset.py] 70 | This function is used to generate a microphone array speech test dataset for room acoustic environment. 71 | 72 | [microphone_array_speech_generator_for_train_dataset.py] 73 | This function is used to generate a microphone array speech training dataset for room acoustic environment. 74 | 75 | [speech_connection.py] 76 | This function is used to implement speech connection. 77 | 78 | ## References 79 | 80 | [1] R. Scheibler, E. Bezzam and I. Dokmanić, "Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms," *2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, Calgary, AB, 2018, pp. 351-355. 81 | 82 | [2] ITU-T (1993). *Objective measurement of active speech level*. ITU-T Recommendation P. 56. 83 | -------------------------------------------------------------------------------- /example/babble.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/babble.wav -------------------------------------------------------------------------------- /example/male.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/male.wav -------------------------------------------------------------------------------- /example/mic_ch0_clean.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean.wav -------------------------------------------------------------------------------- /example/mic_ch0_clean_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_noise.wav -------------------------------------------------------------------------------- /example/mic_ch0_clean_rever.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_rever.wav -------------------------------------------------------------------------------- /example/mic_ch0_clean_rever_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_rever_noise.wav -------------------------------------------------------------------------------- /example/mic_ch0_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_noise.wav -------------------------------------------------------------------------------- /example/mic_ch1_clean.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean.wav -------------------------------------------------------------------------------- /example/mic_ch1_clean_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_noise.wav -------------------------------------------------------------------------------- /example/mic_ch1_clean_rever.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_rever.wav -------------------------------------------------------------------------------- /example/mic_ch1_clean_rever_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_rever_noise.wav -------------------------------------------------------------------------------- /example/mic_ch1_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_noise.wav -------------------------------------------------------------------------------- /example/mic_ch2_clean.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean.wav -------------------------------------------------------------------------------- /example/mic_ch2_clean_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_noise.wav -------------------------------------------------------------------------------- /example/mic_ch2_clean_rever.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_rever.wav -------------------------------------------------------------------------------- /example/mic_ch2_clean_rever_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_rever_noise.wav -------------------------------------------------------------------------------- /example/mic_ch2_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_noise.wav -------------------------------------------------------------------------------- /example/mic_ch3_clean.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean.wav -------------------------------------------------------------------------------- /example/mic_ch3_clean_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_noise.wav -------------------------------------------------------------------------------- /example/mic_ch3_clean_rever.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_rever.wav -------------------------------------------------------------------------------- /example/mic_ch3_clean_rever_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_rever_noise.wav -------------------------------------------------------------------------------- /example/mic_ch3_noise.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_noise.wav -------------------------------------------------------------------------------- /img/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/logo.png -------------------------------------------------------------------------------- /img/method.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/method.png -------------------------------------------------------------------------------- /img/room.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/room.png -------------------------------------------------------------------------------- /img/room_model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/room_model.png -------------------------------------------------------------------------------- /microphone_array_speech_generator/README.md: -------------------------------------------------------------------------------- 1 | ![image_logo](https://github.com/vipchengrui/MASG/blob/master/img/logo.png) 2 | 3 | # MASG 4 | 5 | [![GitHub release](https://img.shields.io/github/release/vipchengrui/MASG/all.svg?style=flat-square)](https://github.com/vipchengrui/MASG/releases) 6 | [![license](https://img.shields.io/github/license/vipchengrui/MASG.svg?style=flat-square)](https://github.com/vipchengrui/MASG/blob/master/LICENSE) 7 | 8 | microphone array speech generator (MASG) in room acoustic. 9 | 10 | ## Abstract 11 | It is used to simulate the speech data received by microphone array of various shapes in room acoustic environment, including clean speech (clean), reverberation speech (clean rever), noisy speech (clean noise), noisy and reverberation speech (clean rever niose) and corresponding noise signal (noise). 12 | 13 | ## Method 14 | 15 | The MASG is implemented based on two tools, namely, *Pyroomacoustic* [1] and an improved version *add_noise_for_multichannel* with add noise [2]. The schematic diagram of MASG is shown in Fig. 1. 16 | 17 | ![image_method](https://github.com/vipchengrui/MASG/blob/master/img/method.png) 18 | 19 | *Fig. 1 The schematic diagram of MASG.* 20 | 21 | Based on *Pyroomacoustic*, the microphone array clean speech is obtained by setting the absorption to 1.0, and the microphone array reverberation speech is obtained by setting the absorption to less than 1.0. With the microphone array clean speech, combined with the noise signal and the expected signal-to-noise ratio (SNR), we can get the corresponding microphone array noise signal, and combine them with the microphone array clean speech and the microphone array reverberation speech to get the microphone array noisy speech and the microphone array noisy reverberation speech. 22 | 23 | From this, we can get the simulation data of all microphone arrays used in indoor acoustic environment. 24 | 25 | ## Simulation Environment 26 | 27 | In order to verify the effect of the MASG, we set up a common room acoustic environment in our life, meeting room scene. This scenario is shown in Fig. 2. 28 | 29 | ![image_room](https://github.com/vipchengrui/MASG/blob/master/img/room.png) 30 | 31 | *Fig. 2 Meeting room acoustic environment.* 32 | 33 | The scene simulates a meeting room with a length of 4m, a width of 3m and a height of 3m. In this room, a 2.2mx1.1mx0.75m conference table, 19 chairs with possible target sound source, and an audible screen are respectively placed. Their coordinates and details are shown in Fig. 2. 34 | 35 | Based on such a meeting room environment, we abstract the room, microphone array, target source and other information used to make the data set, and get the simulation environment as shown in Fig. 3. 36 | 37 | ![image_room_model](https://github.com/vipchengrui/MASG/blob/master/img/room_model.png) 38 | 39 | *Fig. 3 The simulation environment.* 40 | 41 | ## Program List 42 | 43 | The MASG is implemented with Python. The detailed packages and functions are as follows. 44 | 45 | ### Packages 46 | 47 | [numpy] 48 | https://numpy.org/ 49 | https://pypi.org/project/numpy/ 50 | 51 | [matplotlib] 52 | https://matplotlib.org/ 53 | https://pypi.org/project/matplotlib/ 54 | 55 | [scipy] 56 | https://www.scipy.org/ 57 | https://pypi.org/project/scipy/ 58 | 59 | [pyroomacoustic] 60 | https://github.com/LCAV/pyroomacoustics 61 | https://pypi.org/project/pyroomacoustics/ 62 | 63 | ### Functions 64 | 65 | [add_noise_for_multichannel.py] 66 | This function is used to add noise to microphone array clean speech and microphone array reverberation speech based on the expected SNR. 67 | 68 | [microphone_array_speech_generator_for_test_dataset.py] 69 | This function is used to generate a microphone array speech test dataset for room acoustic environment. 70 | 71 | [microphone_array_speech_generator_for_train_dataset.py] 72 | This function is used to generate a microphone array speech training dataset for room acoustic environment. 73 | 74 | [speech_connection.py] 75 | This function is used to implement speech connection. 76 | 77 | ## References 78 | 79 | [1] R. Scheibler, E. Bezzam and I. Dokmanić, "Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms," *2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, Calgary, AB, 2018, pp. 351-355. 80 | 81 | [2] ITU-T (1993). *Objective measurement of active speech level*. ITU-T Recommendation P. 56. 82 | -------------------------------------------------------------------------------- /microphone_array_speech_generator/add_noise_for_multichannel.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # @Date : 2019-11-12 15:13:27 4 | # @Author : Cheng Rui (chengrui@emails.bjut.edu.cn) 5 | # @Function : add noise to clean and reverberation speech for multichannel 6 | # @Veision : Release 0.2 7 | 8 | import numpy as np 9 | import numpy.matlib 10 | import matplotlib.pyplot as plt 11 | from scipy.io import wavfile 12 | import wave 13 | from scipy.signal import lfilter 14 | 15 | # ============= # 16 | # Functions # 17 | # ============= # 18 | 19 | # sub_function - bin_interp 20 | def bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol): 21 | ''' 22 | This implements bin_interp. 23 | 24 | Usage: bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol) 25 | 26 | Example call: 27 | asl_ms_log, cc = bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol) 28 | 29 | Python implementation from MATLAB: Rui Cheng 30 | ''' 31 | 32 | if tol < 0: 33 | tol = -tol 34 | 35 | # check if extreme counts are not already the true active value 36 | iterno = 1 37 | if abs(upcount - upthr - Margin) < tol: 38 | asl_ms_log = upcount 39 | cc = upthr 40 | return asl_ms_log, cc 41 | if abs(lwcount - lwthr - Margin) < tol: 42 | asl_ms_log = lwcount 43 | cc = lwthr 44 | return asl_ms_log, cc 45 | 46 | # Initialize first middle for given (initial) bounds 47 | midcount = (upcount + lwcount) / 2.0 48 | midthr = (upthr + lwthr) / 2.0 49 | # Repeats loop until `diff' falls inside the tolerance (-tol<=diff<=tol) 50 | while 1: 51 | diff = midcount - midthr - Margin 52 | if abs(diff) <= tol: 53 | break 54 | # if tolerance is not met up to 20 iteractions, then relax the tolerance by 10% 55 | iterno = iterno + 1 56 | if iterno > 20: 57 | tol = tol * 1.1 58 | if diff > tol: # then new bounds are ... 59 | midcount = (upcount + midcount) / 2.0 60 | # upper and middle activities 61 | midthr = (upthr + midthr) / 2.0 62 | # ... and thresholds 63 | elif diff < -tol: # then new bounds are ... 64 | midcount = (midcount + lwcount) / 2.0 65 | # middle and lower activities 66 | midthr = (midthr + lwthr) / 2.0 67 | # ... and thresholds 68 | 69 | # Since the tolerance has been satisfied, midcount is selected 70 | # as the interpolated value with a tol [dB] tolerance. 71 | asl_ms_log = midcount 72 | cc = midthr 73 | 74 | return asl_ms_log, cc 75 | 76 | # sub_function - asl_P56 77 | def asl_P56(x, fs, nbits): 78 | ''' 79 | This implements ITU P.56 method B. 80 | 81 | Usage: asl_P56(x, fs, nbits) 82 | 83 | x - the column vector of floating point speech data 84 | fs - the sampling frequency 85 | nbits - the number of bits 86 | 87 | Example call: 88 | asl_ms, asl, c0 = asl_P56(x, fs, nbits) 89 | 90 | References: 91 | [1] ITU-T (1993). Objective measurement of active speech level. ITU-T 92 | Recommendation P. 56 93 | 94 | Python implementation from MATLAB: Rui Cheng 95 | ''' 96 | 97 | T = 0.03 # time constant of smoothing, in seconds 98 | H = 0.2 # hangover time in seconds 99 | M = 15.9 # margin in dB of the difference between threshold and active speech level 100 | thres_no = nbits - 1 # number of thresholds, for 16 bit, it's 15 101 | eps = 2.2204e-16 102 | 103 | I = int(np.ceil(fs*H)) # hangover in samples 104 | g = np.exp(-1/(fs*T)) # smoothing factor in enevlop detection 105 | c = [pow(2,i) for i in range(-15, thres_no-16+1)] 106 | # vector with thresholds from one quantizing level up to half the maximum code, at a step of 2, in the case of 16bit samples, from 2^-15 to 0.5 107 | a = [0 for i in range(thres_no)] # activity counter for each level threshold 108 | hang = [I for i in range(thres_no)] # % hangover counter for each level threshold 109 | 110 | sq = sum(pow(x,2)) # long-term level square energy of x 111 | x_len = len(x) # length of x 112 | 113 | # use a 2nd order IIR filter to detect the envelope q 114 | x_abs = abs(x) 115 | p = lfilter([1-g], [1,-g], x_abs) 116 | q = lfilter([1-g], [1,-g], p) 117 | 118 | for k in range(x_len): 119 | for j in range(thres_no): 120 | if q[k] >= c[j]: 121 | a[j] = a[j] + 1 122 | hang[j] = 0 123 | elif hang[j] < I: 124 | a[j] = a[j] + 1 125 | hang[j] = hang[j] + 1 126 | else: 127 | break 128 | 129 | asl = 0 130 | asl_rms = 0 131 | if a[0] == 0: 132 | print('! ! ! ERROR ! ! !') 133 | else: 134 | AdB1 = 10*np.log10(sq/a[0]+eps) 135 | 136 | CdB1 = 20*np.log10(c[0]+eps) 137 | if AdB1-CdB1 < M: 138 | print('! ! ! ERROR ! ! !') 139 | 140 | AdB = [0 for i in range(thres_no)] 141 | CdB = [0 for i in range(thres_no)] 142 | Delta = [0 for i in range(thres_no)] 143 | AdB[0] = AdB1 144 | CdB[0] = CdB1 145 | Delta[0] = AdB1 - CdB1 146 | 147 | for j in range(1, thres_no): 148 | AdB[j] = 10*np.log10(sq/(a[j]+eps)+eps) 149 | CdB[j] = 20*np.log10(c[j]+eps) 150 | 151 | for j in range(1, thres_no): 152 | if a[j] != 0: 153 | Delta[j] = AdB[j] - CdB[j] 154 | if Delta[j] <= M: # M = 15.9 155 | # interpolate to find the asl 156 | asl_ms_log, cl0 = bin_interp(AdB[j], AdB[j-1], CdB[j], CdB[j-1], M, 0.5) 157 | asl_ms = pow(10, asl_ms_log/10) 158 | asl = (sq/x_len)/asl_ms 159 | c0 = pow(10, cl0/20) 160 | break 161 | 162 | return asl_ms, asl, c0 163 | 164 | # main_function - addnoise 165 | def addnoise(clean_data, clean_rever_data, noise_data, snr, fs): 166 | ''' 167 | This function is used to add noise to clean speech and reverberation speech. 168 | It uses the active speech level to compute the speech energy. 169 | The active speech level is computed as per ITU-T P.56 standard [1]. 170 | 171 | Usage: addnoise(clean_data, clean_rever_data, noise_data, snr, fs) 172 | 173 | clean_data - clean speech data in each channel [nchannel x points] 174 | clean_rever_data - reverberation data in each channel [nchannel x points] 175 | noise_data - noise data, the length of noise has to be greater than speech length [1 x points] 176 | snr - desired snr in dB 177 | fs - sampling frequency 178 | 179 | Note that if the variable IRS below (line 38) is set to 1, then it applies the IRS filter to bandlimit the signal to 300 Hz - 3.2 kHz. 180 | The default IRS value is 0, ie, no IRS filtering is applied. 181 | 182 | Example call: 183 | out_clean_rever_noise, out_clean_noise, out_noise = addnoise_asl(clean_data, clean_rever_data, noise_data, snr, fs) 184 | 185 | References: 186 | [1] ITU-T (1993). Objective measurement of active speech level. ITU-T 187 | Recommendation P. 56 188 | 189 | Author: Yi Hu and Philipos C. Loizou 190 | 191 | Copyright (c) 2006 by Philipos C. Loizou 192 | 193 | Python implementation from MATLAB: Rui Cheng 194 | ''' 195 | 196 | nbits = 16 197 | 198 | # wavread gives floating point column data 199 | # norm by 32768, and change data type to np.double 200 | clean = (clean_data/32768).astype(np.double) 201 | clean_rever = (clean_rever_data/32768).astype(np.double) 202 | noise = (noise_data/32768).astype(np.double) 203 | 204 | # create the output matrix 205 | out_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double) 206 | out_clean_rever_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double) 207 | out_clean_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double) 208 | 209 | # add noise in each channel 210 | for i in range(clean.shape[0]): 211 | 212 | # asl_P56 213 | Px, asl, c0 = asl_P56(clean[i,:], fs, nbits) 214 | # Px is the active speech level ms energy 215 | # asl is the active factor 216 | # c0 is the active speech level threshold 217 | 218 | # get the length of speech and noise 219 | x = clean[i,:] 220 | x_len = len(x) 221 | noise_len = len(noise) 222 | 223 | # adjust the length of the noise 224 | rand_start_limit = noise_len - x_len 225 | # the start of the noise segment can vary between [0, rand_start_limit] 226 | rand_start = int(round(rand_start_limit * np.matlib.rand(1)[0,0] + 1)) 227 | # random start of the noise segment 228 | # rand_start = 10 229 | noise_segment = noise[rand_start:rand_start+x_len] 230 | 231 | # the randomly selected noise segment will be added to the clean and reverberation speech 232 | # clean speech x 233 | Pn = sum(pow(noise_segment,2))/x_len 234 | # we need to scale the noise segment samples to obtain the desired snr = 10*log10[Px/(sf^2 * Pn)] 235 | sf = np.sqrt(Px/Pn/(pow(10,(snr/10)))) # scale factor for noise segment data 236 | 237 | # out_noise 238 | out_noise[i,:] = noise_segment * sf 239 | # out_clean_rever_noise 240 | out_clean_rever_noise[i,:] = clean_rever[i,:] + out_noise[i,:] 241 | # out_clean_noise 242 | out_clean_noise[i,:] = clean[i,:] + out_noise[i,:] 243 | 244 | # anti-norm for wave write 245 | out_noise = (out_noise*32768).astype("int16") 246 | out_clean_rever_noise = (out_clean_rever_noise*32768).astype("int16") 247 | out_clean_noise = (out_clean_noise*32768).astype("int16") 248 | 249 | return out_clean_rever_noise, out_clean_noise, out_noise 250 | -------------------------------------------------------------------------------- /microphone_array_speech_generator/microphone_array_speech_generator_for_test_dataset.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # @Date : 2019-11-25 09:33:27 4 | # @Author : Cheng Rui (chengrui@emails.bjut.edu.cn) 5 | # @Function : microphone array speech generator for test dataset 6 | # @Veision : Release 0.3 7 | 8 | ''' 9 | Version Information 10 | 11 | Release 0.1 12 | 13 | [1] The generator used for single sentence test, and verifies the accuracy of the data 14 | set generator. 15 | 16 | Release 0.2 17 | 18 | [1] The generator is encapsulated as a function, which is executed in the form of call, 19 | and is convenient for the generation of large-scale microphone array speech data. 20 | 21 | [2] Automatic output of microphone array signal generated by circulation. 22 | 23 | Release 0.3 24 | 25 | [1] Using the encapsulated generator function, the speech data set is read automatically 26 | and circularly, to generate the microphone array speech in room acoustic. 27 | [2] Support simultaneous generation of microphone array signals with four signal-to-noise 28 | ratios (-5dB, 0dB, 5dB, 10dB) and seven reverberation times (200ms, 300ms, 400ms, 500ms, 29 | 600ms, 700ms, 800ms). 30 | ''' 31 | 32 | import numpy as np 33 | import matplotlib.pyplot as plt 34 | from scipy.io import wavfile 35 | import glob 36 | import pyroomacoustics as pra 37 | import add_noise_for_multichannel as an 38 | 39 | # ================== # 40 | # Function # 41 | # ================== # 42 | 43 | # single source microphone array clean speech generator. 44 | def mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier): 45 | ''' 46 | This function is used to implement single source microphone array clean speech generator. 47 | 48 | Usage: mic_clean_generator(room_size, target_location, target, fs,microphone_array, amplifier) 49 | 50 | room_size - the size of room [length, width, high] 51 | target_location - the location of target speech [x, y, z] 52 | target - the array of target speech file 53 | fs - sampling frequency 54 | microphone_array - the location of microphone array 55 | amplifier - the multiple of microphone's built-in amplifier 56 | 57 | Example call: 58 | clean = mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier) 59 | 60 | References: 61 | mircophone array speech generator release 0.1 62 | 63 | Author: Rui Cheng 64 | ''' 65 | 66 | # create the room 67 | room = pra.ShoeBox(room_size, fs=fs, absorption=1.0, max_order=17) 68 | '''fig, ax = room.plot() 69 | ax.set_xlim([0, 4.5]) 70 | ax.set_ylim([0, 6.5]) 71 | ax.set_zlim([0, 4]) 72 | plt.show() 73 | ''' 74 | # add source 75 | room.add_source(target_location, signal=target, delay=0) 76 | #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0) # for multi-source 77 | '''fig, ax = room.plot() 78 | ax.set_xlim([0, 4.5]) 79 | ax.set_ylim([0, 6.5]) 80 | ax.set_zlim([0, 4]) 81 | plt.show()''' 82 | 83 | # add microphone array 84 | R = microphone_array 85 | fft_len = 512 86 | Lg_t = 0.100 87 | Lg = np.ceil(Lg_t*room.fs) 88 | mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg) 89 | room.add_microphone_array(mic_array) 90 | '''fig, ax = room.plot() 91 | ax.set_xlim([0, 4.5]) 92 | ax.set_ylim([0, 6.5]) 93 | ax.set_zlim([0, 4]) 94 | plt.show()''' 95 | 96 | # create the room impulse response 97 | # compute image sources 98 | room.image_source_model(use_libroom=True) 99 | # visualize 3D polyhedron room and image sources 100 | '''fig, ax = room.plot(img_order=3) 101 | fig.set_size_inches(20, 10) 102 | plt.show()''' 103 | 104 | '''room.plot_rir() 105 | fig = plt.gcf() 106 | fig.set_size_inches(20, 10) 107 | plt.show()''' 108 | 109 | # microphone speech 110 | room.simulate() 111 | 112 | # clean speech in each channel 113 | clean = amplifier*room.mic_array.signals.astype("int16") 114 | 115 | return clean 116 | 117 | # single source microphone array reverberation speech generator 118 | def mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value): 119 | ''' 120 | This function is used to implement single source microphone array reverberation speech generator. 121 | 122 | Usage: mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value) 123 | 124 | room_size - the size of room [length, width, high] 125 | target_location - the location of target speech [x, y, z] 126 | target - the array of target speech file 127 | fs - sampling frequency 128 | microphone_array - the location of microphone array 129 | amplifier - the multiple of microphone's built-in amplifier 130 | absorption_value - absorption value of room wall 131 | 132 | Example call: 133 | clean_rever = mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value) 134 | 135 | References: 136 | mircophone array speech generator release 0.1 137 | 138 | Author: Rui Cheng 139 | ''' 140 | 141 | # create the room 142 | room = pra.ShoeBox(room_size, fs=fs, absorption=absorption_value, max_order=17) 143 | 144 | room.add_source(target_location, signal=target, delay=0) 145 | #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0) 146 | 147 | # add microphone array 148 | R = microphone_array 149 | fft_len = 512 150 | Lg_t = 0.100 151 | Lg = np.ceil(Lg_t*room.fs) 152 | mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg) 153 | room.add_microphone_array(mic_array) 154 | 155 | # create the room impulse response 156 | # compute image sources 157 | room.image_source_model(use_libroom=True) 158 | 159 | # microphone speech 160 | room.simulate() 161 | 162 | # clean speech in each channel 163 | clean_rever = amplifier*room.mic_array.signals.astype("int16") 164 | 165 | # return 166 | return clean_rever 167 | 168 | 169 | 170 | 171 | 172 | # ================== # 173 | # Main # 174 | # ================== # 175 | 176 | print('Test Dataset') 177 | print('\n') 178 | print( 179 | 'Microphone Array Speech Generator in Room Acoustic [Release 0.3]') 180 | print( 181 | '==================================================================') 182 | print('\n') 183 | 184 | # ================== # 185 | # Parameters # 186 | # ================== # 187 | 188 | print( 189 | 'Parameters') 190 | print( 191 | '------------------------------------------------------------------') 192 | 193 | print('---- Fixed parameter ----') 194 | # room size 195 | room_size = [4, 3, 3] 196 | # microphone array 197 | microphone_array = np.c_[ 198 | [1.82, 1.5, 0.75], # mic 0 199 | [1.86, 1.5, 0.75], # mic 1 200 | [1.90, 1.5, 0.75], # mic 2 201 | [1.94, 1.5, 0.75], # mic 3 202 | [1.98, 1.5, 0.75], # mic 4 203 | [2.02, 1.5, 0.75], # mic 5 204 | [2.06, 1.5, 0.75], # mic 6 205 | [2.10, 1.5, 0.75], # mic 7 206 | [2.14, 1.5, 0.75], # mic 8 207 | [2.18, 1.5, 0.75], # mic 9 208 | ] 209 | # microphone amplification 210 | amplifier = 10 211 | print('room_size') 212 | print(room_size) 213 | print('microphone_array') 214 | print(microphone_array) 215 | print('microphone_amplifier') 216 | print(amplifier) 217 | 218 | print('---- Variation parameters contained in the dataset ----') 219 | # target sources location 220 | target_location = [ 221 | [0.6, 2.8, 1.3], # tar 01 222 | [1.3, 2.8, 1.3], # tar 02 223 | [2.0, 2.8, 1.3], # tar 03 224 | [2.7, 2.8, 1.3], # tar 04 225 | [3.4, 2.8, 1.3], # tar 05 226 | [0.2, 2.15, 1.3], # tar 06 227 | [1.3, 2.25, 1.3], # tar 07 228 | [2.0, 2.25, 1.3], # tar 08 229 | [2.7, 2.25, 1.3], # tar 09 230 | [0.75, 1.5, 1.3], # tar 10 231 | [0.2, 0.85, 1.3], # tar 11 232 | [1.3, 0.75, 1.3], # tar 12 233 | [2.0, 0.75, 1.3], # tar 13 234 | [2.7, 0.75, 1.3], # tar 14 235 | [0.6, 0.2, 1.3], # tar 15 236 | [1.3, 0.2, 1.3], # tar 16 237 | [2.0, 0.2, 1.3], # tar 17 238 | [2.7, 0.2, 1.3], # tar 18 239 | [3.4, 0.2, 1.3], # tar 19 240 | [3.8, 1.5, 1.8], # tar 20 241 | ] 242 | # SNR: -5dB, 0dB, 5dB, 10dB 243 | SNR = [-5, 0, 5, 10] 244 | # absorption: 245 | absorption_value = [ 246 | [0.4391, 200], # alpha=0.4391, RT60=200ms 247 | [0.2927, 300], # alpha=0.2927, RT60=300ms 248 | [0.2195, 400], # alpha=0.2195, RT60=400ms 249 | [0.1756, 500], # alpha=0.1756, RT60=500ms 250 | [0.1464, 600], # alpha=0.1464, RT60=600ms 251 | [0.1255, 700], # alpha=0.1255, RT60=700ms 252 | [0.1098, 800], # alpha=0.1098, RT60=800ms 253 | ] 254 | print('target_location') 255 | print(target_location) 256 | print('signal_to_noise') 257 | print(SNR) 258 | print('absorption') 259 | print(absorption_value) 260 | 261 | # target speech file path 262 | MASG_target_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\target\\' 263 | # noise signal path 264 | noise_file = 'C:\\Projects\\chengrui\\Multi_Data\\Data\\Noisex92\\babble.wav' 265 | # microphone speech file path 266 | MASG_clean_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean\\' 267 | MASG_clean_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_noise\\' 268 | MASG_clean_rever_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_rever\\' 269 | MASG_clean_rever_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_rever_noise\\' 270 | MASG_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_noise\\' 271 | 272 | # read the target speech file 273 | target_file_names = np.array([]) 274 | for file in glob.glob(MASG_target_path + '*.wav'): 275 | target_file_names = np.append(target_file_names, file.strip(MASG_target_path)) 276 | print('target_file_names') 277 | print(target_file_names, target_file_names.shape) 278 | print('\t') 279 | 280 | print( 281 | 'Generate Microphone Array Speech') 282 | print( 283 | '------------------------------------------------------------------') 284 | 285 | # read noise signal 286 | fs, noise = wavfile.read(noise_file) 287 | 288 | # count index 289 | count = 1 290 | 291 | # One by one voice processing 292 | for file_name in target_file_names: 293 | 294 | # read target speech 295 | fs, target = wavfile.read(MASG_target_path+file_name) 296 | 297 | # ================= # 298 | # Generator # 299 | # ================= # 300 | 301 | for snr_index in range(4): 302 | for rt_index in range(7): 303 | # clean 304 | clean = mic_clean_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier) 305 | # reverberation 306 | clean_rever = mic_rever_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier, absorption_value[rt_index][0]) 307 | # add noise 308 | out_clean_rever_noise, out_clean_noise, out_noise = an.addnoise(clean, clean_rever, noise, SNR[snr_index], fs) 309 | 310 | # ===================== # 311 | # Wavfile.wirte # 312 | # ===================== # 313 | 314 | # write wavfile 315 | for i in range(microphone_array.shape[1]): 316 | # file path 317 | file_path = 'ch'+str(i)+'\\'+str(SNR[snr_index])+'dB\\'+str(int(absorption_value[rt_index][1]))+'ms\\' 318 | # clean 319 | wavfile.write( 320 | MASG_clean_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean[i,:]) 321 | # clean_noise 322 | wavfile.write( 323 | MASG_clean_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_noise[i,:]) 324 | # clean_rever 325 | wavfile.write( 326 | MASG_clean_rever_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean_rever[i,:]) 327 | # clean_rever_noise 328 | wavfile.write( 329 | MASG_clean_rever_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_rever_noise[i,:]) 330 | # noise 331 | wavfile.write( 332 | MASG_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_noise[i,:]) 333 | 334 | print(file_name[:7], 'generation completed.', '[', count, '/ 100 ]') 335 | count = count + 1 336 | 337 | print('channel number:', microphone_array.shape[1]) 338 | print('Microphone array speech and noise has been generated.') 339 | print('\t') 340 | plt.show() 341 | 342 | 343 | -------------------------------------------------------------------------------- /microphone_array_speech_generator/microphone_array_speech_generator_for_train_dataset.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # @Date : 2019-11-25 09:33:27 4 | # @Author : Cheng Rui (chengrui@emails.bjut.edu.cn) 5 | # @Function : microphone array speech generator for train dataset 6 | # @Veision : Release 0.3 7 | 8 | ''' 9 | Version Information 10 | 11 | Release 0.1 12 | 13 | [1] The generator used for single sentence test, and verifies the accuracy of the data 14 | set generator. 15 | 16 | Release 0.2 17 | 18 | [1] The generator is encapsulated as a function, which is executed in the form of call, 19 | and is convenient for the generation of large-scale microphone array speech data. 20 | 21 | [2] Automatic output of microphone array signal generated by circulation. 22 | 23 | Release 0.3 24 | 25 | [1] Using the encapsulated generator function, the speech data set is read automatically 26 | and circularly, to generate the microphone array speech in room acoustic. 27 | [2] Support simultaneous generation of microphone array signals with four signal-to-noise 28 | ratios (-5dB, 0dB, 5dB, 10dB) and seven reverberation times (200ms, 300ms, 400ms, 500ms, 29 | 600ms, 700ms, 800ms). 30 | ''' 31 | 32 | import numpy as np 33 | import matplotlib.pyplot as plt 34 | from scipy.io import wavfile 35 | import glob 36 | import pyroomacoustics as pra 37 | import add_noise_for_multichannel as an 38 | 39 | # ================== # 40 | # Function # 41 | # ================== # 42 | 43 | # single source microphone array clean speech generator. 44 | def mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier): 45 | ''' 46 | This function is used to implement single source microphone array clean speech generator. 47 | 48 | Usage: mic_clean_generator(room_size, target_location, target, fs,microphone_array, amplifier) 49 | 50 | room_size - the size of room [length, width, high] 51 | target_location - the location of target speech [x, y, z] 52 | target - the array of target speech file 53 | fs - sampling frequency 54 | microphone_array - the location of microphone array 55 | amplifier - the multiple of microphone's built-in amplifier 56 | 57 | Example call: 58 | clean = mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier) 59 | 60 | References: 61 | mircophone array speech generator release 0.1 62 | 63 | Author: Rui Cheng 64 | ''' 65 | 66 | # create the room 67 | room = pra.ShoeBox(room_size, fs=fs, absorption=1.0, max_order=17) 68 | '''fig, ax = room.plot() 69 | ax.set_xlim([0, 4.5]) 70 | ax.set_ylim([0, 6.5]) 71 | ax.set_zlim([0, 4]) 72 | plt.show() 73 | ''' 74 | # add source 75 | room.add_source(target_location, signal=target, delay=0) 76 | #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0) # for multi-source 77 | '''fig, ax = room.plot() 78 | ax.set_xlim([0, 4.5]) 79 | ax.set_ylim([0, 6.5]) 80 | ax.set_zlim([0, 4]) 81 | plt.show()''' 82 | 83 | # add microphone array 84 | R = microphone_array 85 | fft_len = 512 86 | Lg_t = 0.100 87 | Lg = np.ceil(Lg_t*room.fs) 88 | mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg) 89 | room.add_microphone_array(mic_array) 90 | '''fig, ax = room.plot() 91 | ax.set_xlim([0, 4.5]) 92 | ax.set_ylim([0, 6.5]) 93 | ax.set_zlim([0, 4]) 94 | plt.show()''' 95 | 96 | # create the room impulse response 97 | # compute image sources 98 | room.image_source_model(use_libroom=True) 99 | # visualize 3D polyhedron room and image sources 100 | '''fig, ax = room.plot(img_order=3) 101 | fig.set_size_inches(20, 10) 102 | plt.show()''' 103 | 104 | '''room.plot_rir() 105 | fig = plt.gcf() 106 | fig.set_size_inches(20, 10) 107 | plt.show()''' 108 | 109 | # microphone speech 110 | room.simulate() 111 | 112 | # clean speech in each channel 113 | clean = amplifier*room.mic_array.signals.astype("int16") 114 | 115 | return clean 116 | 117 | # single source microphone array reverberation speech generator 118 | def mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value): 119 | ''' 120 | This function is used to implement single source microphone array reverberation speech generator. 121 | 122 | Usage: mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value) 123 | 124 | room_size - the size of room [length, width, high] 125 | target_location - the location of target speech [x, y, z] 126 | target - the array of target speech file 127 | fs - sampling frequency 128 | microphone_array - the location of microphone array 129 | amplifier - the multiple of microphone's built-in amplifier 130 | absorption_value - absorption value of room wall 131 | 132 | Example call: 133 | clean_rever = mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value) 134 | 135 | References: 136 | mircophone array speech generator release 0.1 137 | 138 | Author: Rui Cheng 139 | ''' 140 | 141 | # create the room 142 | room = pra.ShoeBox(room_size, fs=fs, absorption=absorption_value, max_order=17) 143 | 144 | room.add_source(target_location, signal=target, delay=0) 145 | #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0) 146 | 147 | # add microphone array 148 | R = microphone_array 149 | fft_len = 512 150 | Lg_t = 0.100 151 | Lg = np.ceil(Lg_t*room.fs) 152 | mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg) 153 | room.add_microphone_array(mic_array) 154 | 155 | # create the room impulse response 156 | # compute image sources 157 | room.image_source_model(use_libroom=True) 158 | 159 | # microphone speech 160 | room.simulate() 161 | 162 | # clean speech in each channel 163 | clean_rever = amplifier*room.mic_array.signals.astype("int16") 164 | 165 | # return 166 | return clean_rever 167 | 168 | 169 | 170 | 171 | 172 | # ================== # 173 | # Main # 174 | # ================== # 175 | 176 | print('Train Dataset') 177 | print('\n') 178 | print( 179 | 'Microphone Array Speech Generator in Room Acoustic [Release 0.3]') 180 | print( 181 | '==================================================================') 182 | print('\n') 183 | 184 | # ================== # 185 | # Parameters # 186 | # ================== # 187 | 188 | print( 189 | 'Parameters') 190 | print( 191 | '------------------------------------------------------------------') 192 | 193 | print('---- Fixed parameter ----') 194 | # room size 195 | room_size = [4, 3, 3] 196 | # microphone array 197 | microphone_array = np.c_[ 198 | [1.82, 1.5, 0.75], # mic 0 199 | [1.86, 1.5, 0.75], # mic 1 200 | [1.90, 1.5, 0.75], # mic 2 201 | [1.94, 1.5, 0.75], # mic 3 202 | [1.98, 1.5, 0.75], # mic 4 203 | [2.02, 1.5, 0.75], # mic 5 204 | [2.06, 1.5, 0.75], # mic 6 205 | [2.10, 1.5, 0.75], # mic 7 206 | [2.14, 1.5, 0.75], # mic 8 207 | [2.18, 1.5, 0.75], # mic 9 208 | ] 209 | # microphone amplification 210 | amplifier = 10 211 | print('room_size') 212 | print(room_size) 213 | print('microphone_array') 214 | print(microphone_array) 215 | print('microphone_amplifier') 216 | print(amplifier) 217 | 218 | print('---- Variation parameters contained in the dataset ----') 219 | # target sources location 220 | target_location = [ 221 | [0.6, 2.8, 1.3], # tar 01 222 | [1.3, 2.8, 1.3], # tar 02 223 | [2.0, 2.8, 1.3], # tar 03 224 | [2.7, 2.8, 1.3], # tar 04 225 | [3.4, 2.8, 1.3], # tar 05 226 | [0.2, 2.15, 1.3], # tar 06 227 | [1.3, 2.25, 1.3], # tar 07 228 | [2.0, 2.25, 1.3], # tar 08 229 | [2.7, 2.25, 1.3], # tar 09 230 | [0.75, 1.5, 1.3], # tar 10 231 | [0.2, 0.85, 1.3], # tar 11 232 | [1.3, 0.75, 1.3], # tar 12 233 | [2.0, 0.75, 1.3], # tar 13 234 | [2.7, 0.75, 1.3], # tar 14 235 | [0.6, 0.2, 1.3], # tar 15 236 | [1.3, 0.2, 1.3], # tar 16 237 | [2.0, 0.2, 1.3], # tar 17 238 | [2.7, 0.2, 1.3], # tar 18 239 | [3.4, 0.2, 1.3], # tar 19 240 | [3.8, 1.5, 1.8], # tar 20 241 | ] 242 | # SNR: -5dB, 0dB, 5dB, 10dB 243 | SNR = [-5, 0, 5, 10] 244 | # absorption: 245 | absorption_value = [ 246 | [0.4391, 200], # alpha=0.4391, RT60=200ms 247 | [0.2927, 300], # alpha=0.2927, RT60=300ms 248 | [0.2195, 400], # alpha=0.2195, RT60=400ms 249 | [0.1756, 500], # alpha=0.1756, RT60=500ms 250 | [0.1464, 600], # alpha=0.1464, RT60=600ms 251 | [0.1255, 700], # alpha=0.1255, RT60=700ms 252 | [0.1098, 800], # alpha=0.1098, RT60=800ms 253 | ] 254 | print('target_location') 255 | print(target_location) 256 | print('signal_to_noise') 257 | print(SNR) 258 | print('absorption') 259 | print(absorption_value) 260 | 261 | # target speech file path 262 | MASG_target_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\target\\' 263 | # noise signal path 264 | noise_file = 'C:\\Projects\\chengrui\\Multi_Data\\Data\\Noisex92\\babble.wav' 265 | # microphone speech file path 266 | MASG_clean_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean\\' 267 | MASG_clean_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_noise\\' 268 | MASG_clean_rever_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_rever\\' 269 | MASG_clean_rever_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_rever_noise\\' 270 | MASG_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_noise\\' 271 | 272 | # read the target speech file 273 | target_file_names = np.array([]) 274 | for file in glob.glob(MASG_target_path + '*.wav'): 275 | target_file_names = np.append(target_file_names, file.strip(MASG_target_path)) 276 | print('target_file_names') 277 | print(target_file_names, target_file_names.shape) 278 | print('\t') 279 | 280 | print( 281 | 'Generate Microphone Array Speech') 282 | print( 283 | '------------------------------------------------------------------') 284 | 285 | # read noise signal 286 | fs, noise = wavfile.read(noise_file) 287 | 288 | # count index 289 | count = 1 290 | 291 | # One by one voice processing 292 | for file_name in target_file_names: 293 | 294 | # read target speech 295 | fs, target = wavfile.read(MASG_target_path+file_name) 296 | 297 | # ================= # 298 | # Generator # 299 | # ================= # 300 | 301 | for snr_index in range(4): 302 | for rt_index in range(7): 303 | # clean 304 | clean = mic_clean_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier) 305 | # reverberation 306 | clean_rever = mic_rever_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier, absorption_value[rt_index][0]) 307 | # add noise 308 | out_clean_rever_noise, out_clean_noise, out_noise = an.addnoise(clean, clean_rever, noise, SNR[snr_index], fs) 309 | 310 | # ===================== # 311 | # Wavfile.wirte # 312 | # ===================== # 313 | 314 | # write wavfile 315 | for i in range(microphone_array.shape[1]): 316 | # file path 317 | file_path = 'ch'+str(i)+'\\'+str(SNR[snr_index])+'dB\\'+str(int(absorption_value[rt_index][1]))+'ms\\' 318 | # clean 319 | wavfile.write( 320 | MASG_clean_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean[i,:]) 321 | # clean_noise 322 | wavfile.write( 323 | MASG_clean_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_noise[i,:]) 324 | # clean_rever 325 | wavfile.write( 326 | MASG_clean_rever_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean_rever[i,:]) 327 | # clean_rever_noise 328 | wavfile.write( 329 | MASG_clean_rever_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_rever_noise[i,:]) 330 | # noise 331 | wavfile.write( 332 | MASG_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_noise[i,:]) 333 | 334 | print(file_name[:7], 'generation completed.', '[', count, '/ 400 ]') 335 | count = count + 1 336 | 337 | print('channel number:', microphone_array.shape[1]) 338 | print('Microphone array speech and noise has been generated.') 339 | print('\t') 340 | plt.show() 341 | 342 | 343 | -------------------------------------------------------------------------------- /microphone_array_speech_generator/speech_connection.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | # @Date : 2019-11-27 09:33:27 4 | # @Author : Cheng Rui (chengrui@emails.bjut.edu.cn) 5 | # @Function : speech connection 6 | # @Veision : Release 0.1 7 | 8 | import numpy as np 9 | from scipy.io import wavfile 10 | import glob 11 | 12 | # speech connection 13 | def speech_connection(root_path, mic, channels, SNRs, RTs): 14 | ''' 15 | This function is used to implement speech connection. 16 | 17 | Usage: speech_connection(root_path, mic, channels, SNRs, RTs) 18 | 19 | root_path - address of independent speech and noise 20 | mic - the index of speech and noise 21 | channels - the index of channel 22 | SNRs - the index of SNR 23 | RTs - the index of RT60 24 | 25 | Example call: 26 | speech_connection(root_path, mic, channels, SNRs, RTs) 27 | 28 | Author: Rui Cheng 29 | ''' 30 | 31 | for s in mic: 32 | for ch in channels: 33 | fileNamesTrain = np.array([]) 34 | for snr in SNRs: 35 | for rt in RTs: 36 | for file in glob.glob(root_path+s+'\\'+ch+'\\'+snr+'\\'+rt+'\\'+'*.wav'): 37 | fileNamesTrain = np.append(fileNamesTrain, file) 38 | 39 | print(s, ch, 'have been obtained. There are', fileNamesTrain.shape[0], 'segments in total. Speech connection...') 40 | 41 | trimmed_output_train = np.array([]) 42 | for fileName in fileNamesTrain: 43 | Fs, newFile = wavfile.read(fileName) 44 | outputFile = newFile / np.max(np.abs(newFile)) 45 | trimmed_output_train = np.append(trimmed_output_train, outputFile) 46 | 47 | # define output as int16 in order to avoid clipping using wavfile.write 48 | int_array = trimmed_output_train.astype("int16") 49 | print(int_array, int_array.shape) 50 | 51 | wavfile.write(root_path+s+'\\'+s+'_'+ch+'_4snr_7rt.wav', 16000, int_array) #TIMIT_577(3)_TRAIN TIMIT_288(32)_TRAIN 52 | print ("Connected and output.") 53 | 54 | print('\n') 55 | print('Speech Connection') 56 | print('==========================================================================================================') 57 | print('\n') 58 | 59 | # address and index 60 | train_root_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\' 61 | test_root_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\' 62 | mic = ['mic_clean', 'mic_clean_noise', 'mic_clean_rever', 'mic_clean_rever_noise', 'mic_noise'] 63 | channels = ['ch0', 'ch1', 'ch2', 'ch3', 'ch4', 'ch5', 'ch6', 'ch7', 'ch8', 'ch9'] 64 | SNRs = ['-5dB', '0dB', '5dB', '10dB'] 65 | RTs = ['200ms', '300ms', '400ms', '500ms', '600ms', '700ms', '800ms'] 66 | 67 | '''# speech connection for test dataset 68 | print('[test dataset]') 69 | speech_connection(test_root_path, mic, channels, SNRs, RTs) 70 | print('test dataset has been connection.') 71 | print('\n')''' 72 | 73 | # speech connection for train dataset 74 | print('[train dataset]') 75 | speech_connection(train_root_path, mic, channels, SNRs, RTs) 76 | print('train dataset has been connection.') 77 | print('\n') 78 | -------------------------------------------------------------------------------- /pdf/MASG.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/pdf/MASG.pdf -------------------------------------------------------------------------------- /pdf/Pyroomacoustics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/pdf/Pyroomacoustics.pdf --------------------------------------------------------------------------------