├── .gitignore
├── LICENSE
├── README.md
├── example
    ├── MASG_Microphone_Array_Speech_Generator_in_room_acoustic.ipynb
    ├── babble.wav
    ├── male.wav
    ├── mic_ch0_clean.wav
    ├── mic_ch0_clean_noise.wav
    ├── mic_ch0_clean_rever.wav
    ├── mic_ch0_clean_rever_noise.wav
    ├── mic_ch0_noise.wav
    ├── mic_ch1_clean.wav
    ├── mic_ch1_clean_noise.wav
    ├── mic_ch1_clean_rever.wav
    ├── mic_ch1_clean_rever_noise.wav
    ├── mic_ch1_noise.wav
    ├── mic_ch2_clean.wav
    ├── mic_ch2_clean_noise.wav
    ├── mic_ch2_clean_rever.wav
    ├── mic_ch2_clean_rever_noise.wav
    ├── mic_ch2_noise.wav
    ├── mic_ch3_clean.wav
    ├── mic_ch3_clean_noise.wav
    ├── mic_ch3_clean_rever.wav
    ├── mic_ch3_clean_rever_noise.wav
    └── mic_ch3_noise.wav
├── img
    ├── logo.png
    ├── method.png
    ├── room.png
    └── room_model.png
├── microphone_array_speech_generator
    ├── README.md
    ├── add_noise_for_multichannel.py
    ├── microphone_array_speech_generator_for_test_dataset.py
    ├── microphone_array_speech_generator_for_train_dataset.py
    └── speech_connection.py
└── pdf
    ├── MASG.pdf
    └── Pyroomacoustics.pdf


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # pyenv
 76 | .python-version
 77 | 
 78 | # celery beat schedule file
 79 | celerybeat-schedule
 80 | 
 81 | # SageMath parsed files
 82 | *.sage.py
 83 | 
 84 | # Environments
 85 | .env
 86 | .venv
 87 | env/
 88 | venv/
 89 | ENV/
 90 | env.bak/
 91 | venv.bak/
 92 | 
 93 | # Spyder project settings
 94 | .spyderproject
 95 | .spyproject
 96 | 
 97 | # Rope project settings
 98 | .ropeproject
 99 | 
100 | # mkdocs documentation
101 | /site
102 | 
103 | # mypy
104 | .mypy_cache/
105 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 vipchengrui
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ![image_logo](https://github.com/vipchengrui/MASG/blob/master/img/logo.png)
 2 | 
 3 | # MASG
 4 | 
 5 | [![GitHub release](https://img.shields.io/github/release/vipchengrui/MASG/all.svg?style=flat-square)](https://github.com/vipchengrui/MASG/releases)
 6 | [![license](https://img.shields.io/github/license/vipchengrui/MASG.svg?style=flat-square)](https://github.com/vipchengrui/MASG/blob/master/LICENSE)
 7 | 
 8 | 
 9 | microphone array speech generator (MASG) in room acoustic.
10 | 
11 | ## Abstract
12 | It is used to simulate the speech data received by microphone array of various shapes in room acoustic environment, including clean speech (clean), reverberation speech (clean rever), noisy speech (clean noise), noisy and reverberation speech (clean rever niose) and corresponding noise signal (noise).
13 | 
14 | ## Method
15 | 
16 | The MASG is implemented based on two tools, namely, *Pyroomacoustic* [1] and an improved version *add_noise_for_multichannel* with add noise [2]. The schematic diagram of MASG is shown in Fig. 1.
17 | 
18 | ![image_method](https://github.com/vipchengrui/MASG/blob/master/img/method.png)
19 | 
20 | *Fig. 1 The schematic diagram of MASG.*
21 | 
22 | Based on *Pyroomacoustic*, the microphone array clean speech is obtained by setting the absorption to 1.0, and the microphone array reverberation speech is obtained by setting the absorption to less than 1.0. With the microphone array clean speech, combined with the noise signal and the expected signal-to-noise ratio (SNR), we can get the corresponding microphone array noise signal, and combine them with the microphone array clean speech and the microphone array reverberation speech to get the microphone array noisy speech and the microphone array noisy reverberation speech.
23 | 
24 | From this, we can get the simulation data of all microphone arrays used in indoor acoustic environment.
25 | 
26 | ## Simulation Environment
27 | 
28 | In order to verify the effect of the MASG, we set up a common room acoustic environment in our life, meeting room scene. This scenario is shown in Fig. 2.
29 | 
30 | ![image_room](https://github.com/vipchengrui/MASG/blob/master/img/room.png)
31 | 
32 | *Fig. 2 Meeting room acoustic environment.*
33 | 
34 | The scene simulates a meeting room with a length of 4m, a width of 3m and a height of 3m. In this room, a 2.2mx1.1mx0.75m conference table, 19 chairs with possible target sound source, and an audible screen are respectively placed. Their coordinates and details are shown in Fig. 2.
35 | 
36 | Based on such a meeting room environment, we abstract the room, microphone array, target source and other information used to make the data set, and get the simulation environment as shown in Fig. 3.
37 | 
38 | ![image_room_model](https://github.com/vipchengrui/MASG/blob/master/img/room_model.png)
39 | 
40 | *Fig. 3 The simulation environment.*
41 | 
42 | ## Program List
43 | 
44 | The MASG is implemented with Python. The detailed packages and functions are as follows.
45 | 
46 | ### Packages
47 | 
48 | [numpy]
49 | https://numpy.org/
50 | https://pypi.org/project/numpy/
51 | 
52 | [matplotlib]
53 | https://matplotlib.org/	
54 | https://pypi.org/project/matplotlib/
55 | 
56 | [scipy]
57 | https://www.scipy.org/ 
58 | https://pypi.org/project/scipy/
59 | 
60 | [pyroomacoustic]
61 | https://github.com/LCAV/pyroomacoustics
62 | https://pypi.org/project/pyroomacoustics/
63 | 
64 | ### Functions
65 | 
66 | [add_noise_for_multichannel.py]
67 | This function is used to add noise to microphone array clean speech and microphone array reverberation speech based on the expected SNR.
68 | 
69 | [microphone_array_speech_generator_for_test_dataset.py]
70 | This function is used to generate a microphone array speech test dataset for room acoustic environment.
71 | 
72 | [microphone_array_speech_generator_for_train_dataset.py]
73 | This function is used to generate a microphone array speech training dataset for room acoustic environment.
74 | 
75 | [speech_connection.py]
76 | This function is used to implement speech connection.
77 | 
78 | ## References
79 | 
80 | [1] R. Scheibler, E. Bezzam and I. Dokmanić, "Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms," *2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, Calgary, AB, 2018, pp. 351-355.
81 | 
82 | [2] ITU-T (1993). *Objective measurement of active speech level*. ITU-T Recommendation P. 56.
83 | 


--------------------------------------------------------------------------------
/example/babble.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/babble.wav


--------------------------------------------------------------------------------
/example/male.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/male.wav


--------------------------------------------------------------------------------
/example/mic_ch0_clean.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean.wav


--------------------------------------------------------------------------------
/example/mic_ch0_clean_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch0_clean_rever.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_rever.wav


--------------------------------------------------------------------------------
/example/mic_ch0_clean_rever_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_clean_rever_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch0_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch0_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch1_clean.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean.wav


--------------------------------------------------------------------------------
/example/mic_ch1_clean_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch1_clean_rever.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_rever.wav


--------------------------------------------------------------------------------
/example/mic_ch1_clean_rever_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_clean_rever_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch1_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch1_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch2_clean.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean.wav


--------------------------------------------------------------------------------
/example/mic_ch2_clean_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch2_clean_rever.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_rever.wav


--------------------------------------------------------------------------------
/example/mic_ch2_clean_rever_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_clean_rever_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch2_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch2_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch3_clean.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean.wav


--------------------------------------------------------------------------------
/example/mic_ch3_clean_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch3_clean_rever.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_rever.wav


--------------------------------------------------------------------------------
/example/mic_ch3_clean_rever_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_clean_rever_noise.wav


--------------------------------------------------------------------------------
/example/mic_ch3_noise.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/example/mic_ch3_noise.wav


--------------------------------------------------------------------------------
/img/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/logo.png


--------------------------------------------------------------------------------
/img/method.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/method.png


--------------------------------------------------------------------------------
/img/room.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/room.png


--------------------------------------------------------------------------------
/img/room_model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/img/room_model.png


--------------------------------------------------------------------------------
/microphone_array_speech_generator/README.md:
--------------------------------------------------------------------------------
 1 | ![image_logo](https://github.com/vipchengrui/MASG/blob/master/img/logo.png)
 2 | 
 3 | # MASG
 4 | 
 5 | [![GitHub release](https://img.shields.io/github/release/vipchengrui/MASG/all.svg?style=flat-square)](https://github.com/vipchengrui/MASG/releases)
 6 | [![license](https://img.shields.io/github/license/vipchengrui/MASG.svg?style=flat-square)](https://github.com/vipchengrui/MASG/blob/master/LICENSE)
 7 | 
 8 | microphone array speech generator (MASG) in room acoustic.
 9 | 
10 | ## Abstract
11 | It is used to simulate the speech data received by microphone array of various shapes in room acoustic environment, including clean speech (clean), reverberation speech (clean rever), noisy speech (clean noise), noisy and reverberation speech (clean rever niose) and corresponding noise signal (noise).
12 | 
13 | ## Method
14 | 
15 | The MASG is implemented based on two tools, namely, *Pyroomacoustic* [1] and an improved version *add_noise_for_multichannel* with add noise [2]. The schematic diagram of MASG is shown in Fig. 1.
16 | 
17 | ![image_method](https://github.com/vipchengrui/MASG/blob/master/img/method.png)
18 | 
19 | *Fig. 1 The schematic diagram of MASG.*
20 | 
21 | Based on *Pyroomacoustic*, the microphone array clean speech is obtained by setting the absorption to 1.0, and the microphone array reverberation speech is obtained by setting the absorption to less than 1.0. With the microphone array clean speech, combined with the noise signal and the expected signal-to-noise ratio (SNR), we can get the corresponding microphone array noise signal, and combine them with the microphone array clean speech and the microphone array reverberation speech to get the microphone array noisy speech and the microphone array noisy reverberation speech.
22 | 
23 | From this, we can get the simulation data of all microphone arrays used in indoor acoustic environment.
24 | 
25 | ## Simulation Environment
26 | 
27 | In order to verify the effect of the MASG, we set up a common room acoustic environment in our life, meeting room scene. This scenario is shown in Fig. 2.
28 | 
29 | ![image_room](https://github.com/vipchengrui/MASG/blob/master/img/room.png)
30 | 
31 | *Fig. 2 Meeting room acoustic environment.*
32 | 
33 | The scene simulates a meeting room with a length of 4m, a width of 3m and a height of 3m. In this room, a 2.2mx1.1mx0.75m conference table, 19 chairs with possible target sound source, and an audible screen are respectively placed. Their coordinates and details are shown in Fig. 2.
34 | 
35 | Based on such a meeting room environment, we abstract the room, microphone array, target source and other information used to make the data set, and get the simulation environment as shown in Fig. 3.
36 | 
37 | ![image_room_model](https://github.com/vipchengrui/MASG/blob/master/img/room_model.png)
38 | 
39 | *Fig. 3 The simulation environment.*
40 | 
41 | ## Program List
42 | 
43 | The MASG is implemented with Python. The detailed packages and functions are as follows.
44 | 
45 | ### Packages
46 | 
47 | [numpy]
48 | https://numpy.org/
49 | https://pypi.org/project/numpy/
50 | 
51 | [matplotlib]
52 | https://matplotlib.org/	
53 | https://pypi.org/project/matplotlib/
54 | 
55 | [scipy]
56 | https://www.scipy.org/ 
57 | https://pypi.org/project/scipy/
58 | 
59 | [pyroomacoustic]
60 | https://github.com/LCAV/pyroomacoustics
61 | https://pypi.org/project/pyroomacoustics/
62 | 
63 | ### Functions
64 | 
65 | [add_noise_for_multichannel.py]
66 | This function is used to add noise to microphone array clean speech and microphone array reverberation speech based on the expected SNR.
67 | 
68 | [microphone_array_speech_generator_for_test_dataset.py]
69 | This function is used to generate a microphone array speech test dataset for room acoustic environment.
70 | 
71 | [microphone_array_speech_generator_for_train_dataset.py]
72 | This function is used to generate a microphone array speech training dataset for room acoustic environment.
73 | 
74 | [speech_connection.py]
75 | This function is used to implement speech connection.
76 | 
77 | ## References
78 | 
79 | [1] R. Scheibler, E. Bezzam and I. Dokmanić, "Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms," *2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, Calgary, AB, 2018, pp. 351-355.
80 | 
81 | [2] ITU-T (1993). *Objective measurement of active speech level*. ITU-T Recommendation P. 56.
82 | 


--------------------------------------------------------------------------------
/microphone_array_speech_generator/add_noise_for_multichannel.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | # @Date    : 2019-11-12 15:13:27
  4 | # @Author  : Cheng Rui (chengrui@emails.bjut.edu.cn)
  5 | # @Function : add noise to clean and reverberation speech for multichannel
  6 | # @Veision  : Release 0.2
  7 | 
  8 | import numpy as np
  9 | import numpy.matlib
 10 | import matplotlib.pyplot as plt
 11 | from scipy.io import wavfile
 12 | import wave
 13 | from scipy.signal import lfilter
 14 | 
 15 | # ============= #
 16 | #   Functions   #
 17 | # ============= #
 18 | 
 19 | # sub_function - bin_interp
 20 | def bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol):
 21 |     '''
 22 |     This implements bin_interp. 
 23 | 
 24 |     Usage:  bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol)
 25 | 
 26 |     Example call:
 27 |         asl_ms_log, cc = bin_interp(upcount, lwcount, upthr, lwthr, Margin, tol)
 28 | 
 29 |     Python implementation from MATLAB: Rui Cheng
 30 |     '''
 31 | 
 32 |     if tol < 0:
 33 |         tol = -tol
 34 | 
 35 |     # check if extreme counts are not already the true active value
 36 |     iterno = 1
 37 |     if abs(upcount - upthr - Margin) < tol:
 38 |         asl_ms_log = upcount
 39 |         cc = upthr
 40 |         return asl_ms_log, cc
 41 |     if abs(lwcount - lwthr - Margin) < tol:
 42 |         asl_ms_log = lwcount
 43 |         cc = lwthr
 44 |         return asl_ms_log, cc
 45 | 
 46 |     # Initialize first middle for given (initial) bounds 
 47 |     midcount = (upcount + lwcount) / 2.0
 48 |     midthr = (upthr + lwthr) / 2.0
 49 |     # Repeats loop until `diff' falls inside the tolerance (-tol<=diff<=tol)
 50 |     while 1:
 51 |         diff = midcount - midthr - Margin
 52 |         if abs(diff) <= tol:
 53 |             break
 54 |         # if tolerance is not met up to 20 iteractions, then relax the tolerance by 10%
 55 |         iterno = iterno + 1
 56 |         if iterno > 20:
 57 |             tol = tol * 1.1
 58 |         if diff > tol:  # then new bounds are ...  
 59 |             midcount = (upcount + midcount) / 2.0
 60 |             # upper and middle activities
 61 |             midthr = (upthr + midthr) / 2.0
 62 |             # ... and thresholds
 63 |         elif diff < -tol:   # then new bounds are ... 
 64 |             midcount = (midcount + lwcount) / 2.0
 65 |             # middle and lower activities
 66 |             midthr = (midthr + lwthr) / 2.0
 67 |             # ... and thresholds
 68 | 
 69 |     # Since the tolerance has been satisfied, midcount is selected 
 70 |     # as the interpolated value with a tol [dB] tolerance.
 71 |     asl_ms_log = midcount
 72 |     cc = midthr
 73 | 
 74 |     return asl_ms_log, cc
 75 | 
 76 | # sub_function - asl_P56
 77 | def asl_P56(x, fs, nbits):
 78 |     '''
 79 |     This implements ITU P.56 method B. 
 80 | 
 81 |     Usage:  asl_P56(x, fs, nbits)
 82 | 
 83 |         x             - the column vector of floating point speech data
 84 |         fs            - the sampling frequency
 85 |         nbits         - the number of bits
 86 |     
 87 |     Example call:
 88 |         asl_ms, asl, c0 = asl_P56(x, fs, nbits)
 89 | 
 90 |     References:
 91 |     [1] ITU-T (1993). Objective measurement of active speech level. ITU-T 
 92 |         Recommendation P. 56
 93 | 
 94 |     Python implementation from MATLAB: Rui Cheng
 95 |     '''
 96 | 
 97 |     T = 0.03                # time constant of smoothing, in seconds
 98 |     H = 0.2                 # hangover time in seconds
 99 |     M = 15.9                # margin in dB of the difference between threshold and active speech level
100 |     thres_no = nbits - 1    # number of thresholds, for 16 bit, it's 15
101 |     eps = 2.2204e-16
102 | 
103 |     I = int(np.ceil(fs*H))       # hangover in samples
104 |     g = np.exp(-1/(fs*T))   # smoothing factor in enevlop detection
105 |     c = [pow(2,i) for i in range(-15, thres_no-16+1)]   
106 |     # vector with thresholds from one quantizing level up to half the maximum code, at a step of 2, in the case of 16bit samples, from 2^-15 to 0.5
107 |     a = [0 for i in range(thres_no)]    # activity counter for each level threshold
108 |     hang = [I for i in range(thres_no)]     # % hangover counter for each level threshold
109 | 
110 |     sq = sum(pow(x,2))  # long-term level square energy of x
111 |     x_len = len(x)      # length of x
112 | 
113 |     # use a 2nd order IIR filter to detect the envelope q
114 |     x_abs = abs(x)
115 |     p = lfilter([1-g], [1,-g], x_abs)
116 |     q = lfilter([1-g], [1,-g], p)
117 | 
118 |     for k in range(x_len):
119 |         for j in range(thres_no):
120 |             if q[k] >= c[j]:
121 |                 a[j] = a[j] + 1
122 |                 hang[j] = 0
123 |             elif hang[j] < I:
124 |                 a[j] = a[j] + 1
125 |                 hang[j] = hang[j] + 1
126 |             else:
127 |                 break
128 | 
129 |     asl = 0
130 |     asl_rms = 0
131 |     if a[0] == 0:
132 |         print('! ! ! ERROR ! ! !')
133 |     else:
134 |         AdB1 = 10*np.log10(sq/a[0]+eps)
135 |     
136 |     CdB1 = 20*np.log10(c[0]+eps)
137 |     if AdB1-CdB1 < M:
138 |         print('! ! ! ERROR ! ! !')
139 | 
140 |     AdB = [0 for i in range(thres_no)]
141 |     CdB = [0 for i in range(thres_no)]
142 |     Delta = [0 for i in range(thres_no)]
143 |     AdB[0] = AdB1
144 |     CdB[0] = CdB1
145 |     Delta[0] = AdB1 - CdB1
146 | 
147 |     for j in range(1, thres_no):
148 |         AdB[j] = 10*np.log10(sq/(a[j]+eps)+eps)
149 |         CdB[j] = 20*np.log10(c[j]+eps)
150 | 
151 |     for j in range(1, thres_no):
152 |         if a[j] != 0:
153 |             Delta[j] = AdB[j] - CdB[j]
154 |             if Delta[j] <= M:   # M = 15.9
155 |                 # interpolate to find the asl
156 |                 asl_ms_log, cl0 = bin_interp(AdB[j], AdB[j-1], CdB[j], CdB[j-1], M, 0.5)
157 |                 asl_ms = pow(10, asl_ms_log/10)
158 |                 asl = (sq/x_len)/asl_ms
159 |                 c0 = pow(10, cl0/20)
160 |                 break
161 | 
162 |     return asl_ms, asl, c0
163 | 
164 | # main_function  - addnoise
165 | def addnoise(clean_data, clean_rever_data, noise_data, snr, fs):
166 |     '''
167 |     This function is used to add noise to clean speech and reverberation speech.
168 |     It uses the active speech level to compute the speech energy. 
169 |     The active speech level is computed as per ITU-T P.56 standard [1].
170 |     
171 |     Usage:  addnoise(clean_data, clean_rever_data, noise_data, snr, fs)
172 |                
173 |         clean_data  				- clean speech data in each channel [nchannel x points]
174 |         clean_rever_data 			- reverberation data in each channel [nchannel x points]
175 |         noise_data  				- noise data, the length of noise has to be greater than speech length [1 x points]
176 |         snr           				- desired snr in dB
177 |         fs                          - sampling frequency
178 |     
179 |     Note that if the variable IRS below (line 38) is set to 1, then it applies the IRS filter to bandlimit the signal to 300 Hz - 3.2 kHz.
180 |     The default IRS value is 0, ie, no IRS filtering is applied.
181 |     
182 |     Example call:
183 |         out_clean_rever_noise, out_clean_noise, out_noise = addnoise_asl(clean_data, clean_rever_data, noise_data, snr, fs)
184 |     
185 |     References:
186 |     [1] ITU-T (1993). Objective measurement of active speech level. ITU-T 
187 |         Recommendation P. 56
188 |     
189 |     Author: Yi Hu and Philipos C. Loizou 
190 |     
191 |     Copyright (c) 2006 by Philipos C. Loizou
192 | 
193 |     Python implementation from MATLAB: Rui Cheng
194 |     '''
195 | 
196 |     nbits = 16
197 |     
198 |     # wavread gives floating point column data
199 |     # norm by 32768, and change data type to np.double
200 |     clean = (clean_data/32768).astype(np.double)
201 |     clean_rever = (clean_rever_data/32768).astype(np.double)
202 |     noise = (noise_data/32768).astype(np.double)
203 |     
204 |     # create the output matrix
205 |     out_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double)
206 |     out_clean_rever_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double)
207 |     out_clean_noise = np.zeros((clean.shape[0],clean.shape[1]), dtype=np.double)
208 | 
209 |     # add noise in each channel
210 |     for i in range(clean.shape[0]):
211 | 
212 |         # asl_P56
213 |         Px, asl, c0 = asl_P56(clean[i,:], fs, nbits)
214 |         # Px is the active speech level ms energy
215 |         # asl is the active factor
216 |         # c0 is the active speech level threshold
217 |         
218 |         # get the length of speech and noise
219 |         x = clean[i,:]
220 |         x_len = len(x)
221 |         noise_len = len(noise)
222 |         
223 |         # adjust the length of the noise
224 |         rand_start_limit = noise_len - x_len
225 |         # the start of the noise segment can vary between [0, rand_start_limit]
226 |         rand_start = int(round(rand_start_limit * np.matlib.rand(1)[0,0] + 1))
227 |         # random start of the noise segment
228 |         # rand_start = 10
229 |         noise_segment = noise[rand_start:rand_start+x_len]
230 | 
231 |         # the randomly selected noise segment will be added to the clean and reverberation speech
232 |         # clean speech x
233 |         Pn = sum(pow(noise_segment,2))/x_len
234 |         # we need to scale the noise segment samples to obtain the desired snr = 10*log10[Px/(sf^2 * Pn)]
235 |         sf = np.sqrt(Px/Pn/(pow(10,(snr/10))))   # scale factor for noise segment data
236 | 
237 |         # out_noise
238 |         out_noise[i,:] = noise_segment * sf
239 |         # out_clean_rever_noise
240 |         out_clean_rever_noise[i,:] = clean_rever[i,:] + out_noise[i,:]
241 |         # out_clean_noise
242 |         out_clean_noise[i,:] = clean[i,:] + out_noise[i,:]
243 | 
244 |     # anti-norm for wave write
245 |     out_noise = (out_noise*32768).astype("int16")
246 |     out_clean_rever_noise = (out_clean_rever_noise*32768).astype("int16")
247 |     out_clean_noise = (out_clean_noise*32768).astype("int16")
248 | 
249 |     return out_clean_rever_noise, out_clean_noise, out_noise
250 | 


--------------------------------------------------------------------------------
/microphone_array_speech_generator/microphone_array_speech_generator_for_test_dataset.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | # @Date     : 2019-11-25 09:33:27
  4 | # @Author   : Cheng Rui (chengrui@emails.bjut.edu.cn)
  5 | # @Function : microphone array speech generator for test dataset
  6 | # @Veision  : Release 0.3
  7 | 
  8 | '''
  9 | Version Information 
 10 | 
 11 | Release 0.1
 12 | 
 13 | [1] The generator used for single sentence test, and verifies the accuracy of the data 
 14 | set generator.
 15 | 
 16 | Release 0.2
 17 | 
 18 | [1] The generator is encapsulated as a function, which is executed in the form of call, 
 19 | and is convenient for the generation of large-scale microphone array speech data.
 20 | 
 21 | [2] Automatic output of microphone array signal generated by circulation.
 22 | 
 23 | Release 0.3
 24 | 
 25 | [1] Using the encapsulated generator function, the speech data set is read automatically
 26 | and circularly, to generate the microphone array speech in room acoustic.
 27 | [2] Support simultaneous generation of microphone array signals with four signal-to-noise
 28 | ratios (-5dB, 0dB, 5dB, 10dB) and seven reverberation times (200ms, 300ms, 400ms, 500ms, 
 29 | 600ms, 700ms, 800ms).
 30 | '''
 31 | 
 32 | import numpy as np
 33 | import matplotlib.pyplot as plt
 34 | from scipy.io import wavfile
 35 | import glob
 36 | import pyroomacoustics as pra
 37 | import add_noise_for_multichannel as an
 38 | 
 39 | # ================== #
 40 | #      Function      #
 41 | # ================== #
 42 | 
 43 | # single source microphone array clean speech generator.
 44 | def mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier):
 45 |     '''
 46 |     This function is used to implement single source microphone array clean speech generator.
 47 |     
 48 |     Usage:  mic_clean_generator(room_size, target_location, target, fs,microphone_array, amplifier)
 49 | 
 50 |         room_size                  - the size of room [length, width, high]
 51 |         target_location            - the location of target speech [x, y, z]
 52 |         target                     - the array of target speech file
 53 |         fs                         - sampling frequency
 54 |         microphone_array           - the location of microphone array
 55 |         amplifier                  - the multiple of microphone's built-in amplifier
 56 |     
 57 |     Example call:
 58 |         clean = mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier)
 59 | 
 60 |     References:
 61 |     mircophone array speech generator release 0.1
 62 |     
 63 |     Author: Rui Cheng
 64 |     '''
 65 | 
 66 |     # create the room
 67 |     room = pra.ShoeBox(room_size, fs=fs, absorption=1.0, max_order=17)
 68 |     '''fig, ax = room.plot()
 69 |     ax.set_xlim([0, 4.5])
 70 |     ax.set_ylim([0, 6.5])
 71 |     ax.set_zlim([0, 4])
 72 |     plt.show()
 73 |     '''
 74 |     # add source
 75 |     room.add_source(target_location, signal=target, delay=0)
 76 |     #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0)    # for multi-source
 77 |     '''fig, ax = room.plot()
 78 |     ax.set_xlim([0, 4.5])
 79 |     ax.set_ylim([0, 6.5])
 80 |     ax.set_zlim([0, 4])
 81 |     plt.show()'''
 82 | 
 83 |     # add microphone array
 84 |     R = microphone_array
 85 |     fft_len = 512
 86 |     Lg_t = 0.100
 87 |     Lg = np.ceil(Lg_t*room.fs)
 88 |     mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg)
 89 |     room.add_microphone_array(mic_array)
 90 |     '''fig, ax = room.plot()
 91 |     ax.set_xlim([0, 4.5])
 92 |     ax.set_ylim([0, 6.5])
 93 |     ax.set_zlim([0, 4])
 94 |     plt.show()'''
 95 | 
 96 |     # create the room impulse response
 97 |     # compute image sources
 98 |     room.image_source_model(use_libroom=True)
 99 |     # visualize 3D polyhedron room and image sources
100 |     '''fig, ax = room.plot(img_order=3)
101 |     fig.set_size_inches(20, 10)
102 |     plt.show()'''
103 | 
104 |     '''room.plot_rir()
105 |     fig = plt.gcf()
106 |     fig.set_size_inches(20, 10)
107 |     plt.show()'''
108 | 
109 |     # microphone speech
110 |     room.simulate()
111 | 
112 |     # clean speech in each channel
113 |     clean = amplifier*room.mic_array.signals.astype("int16")
114 | 
115 |     return clean
116 | 
117 | # single source microphone array reverberation speech generator
118 | def mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value):
119 |     '''
120 |     This function is used to implement single source microphone array reverberation speech generator.
121 |     
122 |     Usage:  mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value)
123 |                
124 |         room_size                  - the size of room [length, width, high]
125 |         target_location            - the location of target speech [x, y, z]
126 |         target                     - the array of target speech file
127 |         fs                         - sampling frequency
128 |         microphone_array           - the location of microphone array
129 |         amplifier                  - the multiple of microphone's built-in amplifier
130 |         absorption_value           - absorption value of room wall
131 |     
132 |     Example call:
133 |         clean_rever = mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value)
134 |     
135 |     References:
136 |     mircophone array speech generator release 0.1
137 |     
138 |     Author: Rui Cheng
139 |     '''
140 | 
141 |     # create the room
142 |     room = pra.ShoeBox(room_size, fs=fs, absorption=absorption_value, max_order=17)
143 | 
144 |     room.add_source(target_location, signal=target, delay=0)
145 |     #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0)
146 | 
147 |     # add microphone array
148 |     R = microphone_array
149 |     fft_len = 512
150 |     Lg_t = 0.100
151 |     Lg = np.ceil(Lg_t*room.fs)
152 |     mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg)
153 |     room.add_microphone_array(mic_array)
154 | 
155 |     # create the room impulse response
156 |     # compute image sources
157 |     room.image_source_model(use_libroom=True)
158 | 
159 |     # microphone speech
160 |     room.simulate()
161 | 
162 |     # clean speech in each channel
163 |     clean_rever = amplifier*room.mic_array.signals.astype("int16")
164 | 
165 |     # return
166 |     return clean_rever
167 | 
168 | 
169 | 
170 | 
171 | 
172 | # ================== #
173 | #        Main        #
174 | # ================== #
175 | 
176 | print('Test Dataset')
177 | print('\n')
178 | print(
179 |     'Microphone Array Speech Generator in Room Acoustic [Release 0.3]')
180 | print(
181 |     '==================================================================')
182 | print('\n')
183 | 
184 | # ================== #
185 | #     Parameters     #
186 | # ================== #
187 | 
188 | print(
189 |     'Parameters')
190 | print(
191 |     '------------------------------------------------------------------')
192 | 
193 | print('---- Fixed parameter ----')
194 | # room size
195 | room_size = [4, 3, 3]
196 | # microphone array
197 | microphone_array = np.c_[
198 |     [1.82, 1.5, 0.75],  # mic 0
199 |     [1.86, 1.5, 0.75],  # mic 1
200 |     [1.90, 1.5, 0.75],  # mic 2
201 |     [1.94, 1.5, 0.75],  # mic 3
202 |     [1.98, 1.5, 0.75],  # mic 4
203 |     [2.02, 1.5, 0.75],  # mic 5
204 |     [2.06, 1.5, 0.75],  # mic 6
205 |     [2.10, 1.5, 0.75],  # mic 7
206 |     [2.14, 1.5, 0.75],  # mic 8
207 |     [2.18, 1.5, 0.75],  # mic 9
208 |     ]
209 | # microphone amplification
210 | amplifier = 10
211 | print('room_size')
212 | print(room_size)
213 | print('microphone_array')
214 | print(microphone_array)
215 | print('microphone_amplifier')
216 | print(amplifier)
217 | 
218 | print('---- Variation parameters contained in the dataset ----')
219 | # target sources location
220 | target_location = [
221 |     [0.6, 2.8, 1.3],    # tar 01
222 |     [1.3, 2.8, 1.3],    # tar 02
223 |     [2.0, 2.8, 1.3],    # tar 03
224 |     [2.7, 2.8, 1.3],    # tar 04
225 |     [3.4, 2.8, 1.3],    # tar 05
226 |     [0.2, 2.15, 1.3],   # tar 06
227 |     [1.3, 2.25, 1.3],   # tar 07
228 |     [2.0, 2.25, 1.3],   # tar 08
229 |     [2.7, 2.25, 1.3],   # tar 09
230 |     [0.75, 1.5, 1.3],   # tar 10
231 |     [0.2, 0.85, 1.3],   # tar 11
232 |     [1.3, 0.75, 1.3],   # tar 12
233 |     [2.0, 0.75, 1.3],   # tar 13
234 |     [2.7, 0.75, 1.3],   # tar 14
235 |     [0.6, 0.2, 1.3],    # tar 15
236 |     [1.3, 0.2, 1.3],    # tar 16
237 |     [2.0, 0.2, 1.3],    # tar 17
238 |     [2.7, 0.2, 1.3],    # tar 18
239 |     [3.4, 0.2, 1.3],    # tar 19
240 |     [3.8, 1.5, 1.8],    # tar 20
241 |     ]
242 | # SNR: -5dB, 0dB, 5dB, 10dB
243 | SNR = [-5, 0, 5, 10]
244 | # absorption:
245 | absorption_value = [
246 |     [0.4391, 200],            # alpha=0.4391, RT60=200ms
247 |     [0.2927, 300],            # alpha=0.2927, RT60=300ms
248 |     [0.2195, 400],            # alpha=0.2195, RT60=400ms
249 |     [0.1756, 500],            # alpha=0.1756, RT60=500ms
250 |     [0.1464, 600],            # alpha=0.1464, RT60=600ms
251 |     [0.1255, 700],            # alpha=0.1255, RT60=700ms
252 |     [0.1098, 800],            # alpha=0.1098, RT60=800ms
253 |     ]
254 | print('target_location')
255 | print(target_location)
256 | print('signal_to_noise')
257 | print(SNR)
258 | print('absorption')
259 | print(absorption_value)
260 | 
261 | # target speech file path
262 | MASG_target_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\target\\'
263 | # noise signal path
264 | noise_file = 'C:\\Projects\\chengrui\\Multi_Data\\Data\\Noisex92\\babble.wav'
265 | # microphone speech file path
266 | MASG_clean_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean\\'
267 | MASG_clean_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_noise\\'
268 | MASG_clean_rever_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_rever\\'
269 | MASG_clean_rever_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_clean_rever_noise\\'
270 | MASG_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\mic_noise\\'
271 | 
272 | # read the target speech file
273 | target_file_names = np.array([])
274 | for file in glob.glob(MASG_target_path + '*.wav'):
275 |     target_file_names = np.append(target_file_names, file.strip(MASG_target_path))
276 | print('target_file_names')
277 | print(target_file_names, target_file_names.shape)
278 | print('\t')
279 | 
280 | print(
281 |     'Generate Microphone Array Speech')
282 | print(
283 |     '------------------------------------------------------------------')
284 | 
285 | # read noise signal
286 | fs, noise = wavfile.read(noise_file) 
287 | 
288 | # count index
289 | count = 1
290 | 
291 | # One by one voice processing
292 | for file_name in target_file_names:
293 | 
294 |     # read target speech
295 |     fs, target = wavfile.read(MASG_target_path+file_name) 
296 |     
297 |     # ================= #
298 |     #     Generator     #
299 |     # ================= #
300 |     
301 |     for snr_index in range(4):
302 |         for rt_index in range(7):
303 |             # clean
304 |             clean = mic_clean_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier)
305 |             # reverberation
306 |             clean_rever = mic_rever_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier, absorption_value[rt_index][0])
307 |             # add noise
308 |             out_clean_rever_noise, out_clean_noise, out_noise = an.addnoise(clean, clean_rever, noise, SNR[snr_index], fs)
309 | 
310 |             # ===================== #
311 |             #     Wavfile.wirte     #
312 |             # ===================== #
313 | 
314 |             # write wavfile
315 |             for i in range(microphone_array.shape[1]):
316 |                 # file path
317 |                 file_path = 'ch'+str(i)+'\\'+str(SNR[snr_index])+'dB\\'+str(int(absorption_value[rt_index][1]))+'ms\\'
318 |                 # clean
319 |                 wavfile.write(
320 |                     MASG_clean_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean[i,:])
321 |                 # clean_noise
322 |                 wavfile.write(
323 |                     MASG_clean_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_noise[i,:])
324 |                 # clean_rever
325 |                 wavfile.write(
326 |                     MASG_clean_rever_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean_rever[i,:])
327 |                 # clean_rever_noise
328 |                 wavfile.write(
329 |                     MASG_clean_rever_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_rever_noise[i,:])
330 |                 # noise
331 |                 wavfile.write(
332 |                     MASG_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_noise[i,:])
333 | 
334 |     print(file_name[:7], 'generation completed.', '[', count, '/ 100 ]')
335 |     count = count + 1
336 | 
337 | print('channel number:', microphone_array.shape[1])
338 | print('Microphone array speech and noise has been generated.')
339 | print('\t')
340 | plt.show()
341 | 
342 | 
343 | 


--------------------------------------------------------------------------------
/microphone_array_speech_generator/microphone_array_speech_generator_for_train_dataset.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | # @Date     : 2019-11-25 09:33:27
  4 | # @Author   : Cheng Rui (chengrui@emails.bjut.edu.cn)
  5 | # @Function : microphone array speech generator for train dataset
  6 | # @Veision  : Release 0.3
  7 | 
  8 | '''
  9 | Version Information 
 10 | 
 11 | Release 0.1
 12 | 
 13 | [1] The generator used for single sentence test, and verifies the accuracy of the data 
 14 | set generator.
 15 | 
 16 | Release 0.2
 17 | 
 18 | [1] The generator is encapsulated as a function, which is executed in the form of call, 
 19 | and is convenient for the generation of large-scale microphone array speech data.
 20 | 
 21 | [2] Automatic output of microphone array signal generated by circulation.
 22 | 
 23 | Release 0.3
 24 | 
 25 | [1] Using the encapsulated generator function, the speech data set is read automatically
 26 | and circularly, to generate the microphone array speech in room acoustic.
 27 | [2] Support simultaneous generation of microphone array signals with four signal-to-noise
 28 | ratios (-5dB, 0dB, 5dB, 10dB) and seven reverberation times (200ms, 300ms, 400ms, 500ms, 
 29 | 600ms, 700ms, 800ms).
 30 | '''
 31 | 
 32 | import numpy as np
 33 | import matplotlib.pyplot as plt
 34 | from scipy.io import wavfile
 35 | import glob
 36 | import pyroomacoustics as pra
 37 | import add_noise_for_multichannel as an
 38 | 
 39 | # ================== #
 40 | #      Function      #
 41 | # ================== #
 42 | 
 43 | # single source microphone array clean speech generator.
 44 | def mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier):
 45 |     '''
 46 |     This function is used to implement single source microphone array clean speech generator.
 47 |     
 48 |     Usage:  mic_clean_generator(room_size, target_location, target, fs,microphone_array, amplifier)
 49 | 
 50 |         room_size                  - the size of room [length, width, high]
 51 |         target_location            - the location of target speech [x, y, z]
 52 |         target                     - the array of target speech file
 53 |         fs                         - sampling frequency
 54 |         microphone_array           - the location of microphone array
 55 |         amplifier                  - the multiple of microphone's built-in amplifier
 56 |     
 57 |     Example call:
 58 |         clean = mic_clean_generator(room_size, target_location, target, fs, microphone_array, amplifier)
 59 | 
 60 |     References:
 61 |     mircophone array speech generator release 0.1
 62 |     
 63 |     Author: Rui Cheng
 64 |     '''
 65 | 
 66 |     # create the room
 67 |     room = pra.ShoeBox(room_size, fs=fs, absorption=1.0, max_order=17)
 68 |     '''fig, ax = room.plot()
 69 |     ax.set_xlim([0, 4.5])
 70 |     ax.set_ylim([0, 6.5])
 71 |     ax.set_zlim([0, 4])
 72 |     plt.show()
 73 |     '''
 74 |     # add source
 75 |     room.add_source(target_location, signal=target, delay=0)
 76 |     #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0)    # for multi-source
 77 |     '''fig, ax = room.plot()
 78 |     ax.set_xlim([0, 4.5])
 79 |     ax.set_ylim([0, 6.5])
 80 |     ax.set_zlim([0, 4])
 81 |     plt.show()'''
 82 | 
 83 |     # add microphone array
 84 |     R = microphone_array
 85 |     fft_len = 512
 86 |     Lg_t = 0.100
 87 |     Lg = np.ceil(Lg_t*room.fs)
 88 |     mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg)
 89 |     room.add_microphone_array(mic_array)
 90 |     '''fig, ax = room.plot()
 91 |     ax.set_xlim([0, 4.5])
 92 |     ax.set_ylim([0, 6.5])
 93 |     ax.set_zlim([0, 4])
 94 |     plt.show()'''
 95 | 
 96 |     # create the room impulse response
 97 |     # compute image sources
 98 |     room.image_source_model(use_libroom=True)
 99 |     # visualize 3D polyhedron room and image sources
100 |     '''fig, ax = room.plot(img_order=3)
101 |     fig.set_size_inches(20, 10)
102 |     plt.show()'''
103 | 
104 |     '''room.plot_rir()
105 |     fig = plt.gcf()
106 |     fig.set_size_inches(20, 10)
107 |     plt.show()'''
108 | 
109 |     # microphone speech
110 |     room.simulate()
111 | 
112 |     # clean speech in each channel
113 |     clean = amplifier*room.mic_array.signals.astype("int16")
114 | 
115 |     return clean
116 | 
117 | # single source microphone array reverberation speech generator
118 | def mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value):
119 |     '''
120 |     This function is used to implement single source microphone array reverberation speech generator.
121 |     
122 |     Usage:  mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value)
123 |                
124 |         room_size                  - the size of room [length, width, high]
125 |         target_location            - the location of target speech [x, y, z]
126 |         target                     - the array of target speech file
127 |         fs                         - sampling frequency
128 |         microphone_array           - the location of microphone array
129 |         amplifier                  - the multiple of microphone's built-in amplifier
130 |         absorption_value           - absorption value of room wall
131 |     
132 |     Example call:
133 |         clean_rever = mic_rever_generator(room_size, target_location, target, fs, microphone_array, amplifier, absorption_value)
134 |     
135 |     References:
136 |     mircophone array speech generator release 0.1
137 |     
138 |     Author: Rui Cheng
139 |     '''
140 | 
141 |     # create the room
142 |     room = pra.ShoeBox(room_size, fs=fs, absorption=absorption_value, max_order=17)
143 | 
144 |     room.add_source(target_location, signal=target, delay=0)
145 |     #room.add_source([3.5, 3.0, 1.76], signal=interf[:len(target)], delay=0)
146 | 
147 |     # add microphone array
148 |     R = microphone_array
149 |     fft_len = 512
150 |     Lg_t = 0.100
151 |     Lg = np.ceil(Lg_t*room.fs)
152 |     mic_array = pra.Beamformer(R, room.fs, N=fft_len, Lg=Lg)
153 |     room.add_microphone_array(mic_array)
154 | 
155 |     # create the room impulse response
156 |     # compute image sources
157 |     room.image_source_model(use_libroom=True)
158 | 
159 |     # microphone speech
160 |     room.simulate()
161 | 
162 |     # clean speech in each channel
163 |     clean_rever = amplifier*room.mic_array.signals.astype("int16")
164 | 
165 |     # return
166 |     return clean_rever
167 | 
168 | 
169 | 
170 | 
171 | 
172 | # ================== #
173 | #        Main        #
174 | # ================== #
175 | 
176 | print('Train Dataset')
177 | print('\n')
178 | print(
179 |     'Microphone Array Speech Generator in Room Acoustic [Release 0.3]')
180 | print(
181 |     '==================================================================')
182 | print('\n')
183 | 
184 | # ================== #
185 | #     Parameters     #
186 | # ================== #
187 | 
188 | print(
189 |     'Parameters')
190 | print(
191 |     '------------------------------------------------------------------')
192 | 
193 | print('---- Fixed parameter ----')
194 | # room size
195 | room_size = [4, 3, 3]
196 | # microphone array
197 | microphone_array = np.c_[
198 |     [1.82, 1.5, 0.75],  # mic 0
199 |     [1.86, 1.5, 0.75],  # mic 1
200 |     [1.90, 1.5, 0.75],  # mic 2
201 |     [1.94, 1.5, 0.75],  # mic 3
202 |     [1.98, 1.5, 0.75],  # mic 4
203 |     [2.02, 1.5, 0.75],  # mic 5
204 |     [2.06, 1.5, 0.75],  # mic 6
205 |     [2.10, 1.5, 0.75],  # mic 7
206 |     [2.14, 1.5, 0.75],  # mic 8
207 |     [2.18, 1.5, 0.75],  # mic 9
208 |     ]
209 | # microphone amplification
210 | amplifier = 10
211 | print('room_size')
212 | print(room_size)
213 | print('microphone_array')
214 | print(microphone_array)
215 | print('microphone_amplifier')
216 | print(amplifier)
217 | 
218 | print('---- Variation parameters contained in the dataset ----')
219 | # target sources location
220 | target_location = [
221 |     [0.6, 2.8, 1.3],    # tar 01
222 |     [1.3, 2.8, 1.3],    # tar 02
223 |     [2.0, 2.8, 1.3],    # tar 03
224 |     [2.7, 2.8, 1.3],    # tar 04
225 |     [3.4, 2.8, 1.3],    # tar 05
226 |     [0.2, 2.15, 1.3],   # tar 06
227 |     [1.3, 2.25, 1.3],   # tar 07
228 |     [2.0, 2.25, 1.3],   # tar 08
229 |     [2.7, 2.25, 1.3],   # tar 09
230 |     [0.75, 1.5, 1.3],   # tar 10
231 |     [0.2, 0.85, 1.3],   # tar 11
232 |     [1.3, 0.75, 1.3],   # tar 12
233 |     [2.0, 0.75, 1.3],   # tar 13
234 |     [2.7, 0.75, 1.3],   # tar 14
235 |     [0.6, 0.2, 1.3],    # tar 15
236 |     [1.3, 0.2, 1.3],    # tar 16
237 |     [2.0, 0.2, 1.3],    # tar 17
238 |     [2.7, 0.2, 1.3],    # tar 18
239 |     [3.4, 0.2, 1.3],    # tar 19
240 |     [3.8, 1.5, 1.8],    # tar 20
241 |     ]
242 | # SNR: -5dB, 0dB, 5dB, 10dB
243 | SNR = [-5, 0, 5, 10]
244 | # absorption:
245 | absorption_value = [
246 |     [0.4391, 200],            # alpha=0.4391, RT60=200ms
247 |     [0.2927, 300],            # alpha=0.2927, RT60=300ms
248 |     [0.2195, 400],            # alpha=0.2195, RT60=400ms
249 |     [0.1756, 500],            # alpha=0.1756, RT60=500ms
250 |     [0.1464, 600],            # alpha=0.1464, RT60=600ms
251 |     [0.1255, 700],            # alpha=0.1255, RT60=700ms
252 |     [0.1098, 800],            # alpha=0.1098, RT60=800ms
253 |     ]
254 | print('target_location')
255 | print(target_location)
256 | print('signal_to_noise')
257 | print(SNR)
258 | print('absorption')
259 | print(absorption_value)
260 | 
261 | # target speech file path
262 | MASG_target_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\target\\'
263 | # noise signal path
264 | noise_file = 'C:\\Projects\\chengrui\\Multi_Data\\Data\\Noisex92\\babble.wav'
265 | # microphone speech file path
266 | MASG_clean_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean\\'
267 | MASG_clean_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_noise\\'
268 | MASG_clean_rever_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_rever\\'
269 | MASG_clean_rever_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_clean_rever_noise\\'
270 | MASG_noise_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\mic_noise\\'
271 | 
272 | # read the target speech file
273 | target_file_names = np.array([])
274 | for file in glob.glob(MASG_target_path + '*.wav'):
275 |     target_file_names = np.append(target_file_names, file.strip(MASG_target_path))
276 | print('target_file_names')
277 | print(target_file_names, target_file_names.shape)
278 | print('\t')
279 | 
280 | print(
281 |     'Generate Microphone Array Speech')
282 | print(
283 |     '------------------------------------------------------------------')
284 | 
285 | # read noise signal
286 | fs, noise = wavfile.read(noise_file) 
287 | 
288 | # count index
289 | count = 1
290 | 
291 | # One by one voice processing
292 | for file_name in target_file_names:
293 | 
294 |     # read target speech
295 |     fs, target = wavfile.read(MASG_target_path+file_name) 
296 |     
297 |     # ================= #
298 |     #     Generator     #
299 |     # ================= #
300 |     
301 |     for snr_index in range(4):
302 |         for rt_index in range(7):
303 |             # clean
304 |             clean = mic_clean_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier)
305 |             # reverberation
306 |             clean_rever = mic_rever_generator(room_size, target_location[int(file_name[1:3])-1], target, fs, microphone_array, amplifier, absorption_value[rt_index][0])
307 |             # add noise
308 |             out_clean_rever_noise, out_clean_noise, out_noise = an.addnoise(clean, clean_rever, noise, SNR[snr_index], fs)
309 | 
310 |             # ===================== #
311 |             #     Wavfile.wirte     #
312 |             # ===================== #
313 | 
314 |             # write wavfile
315 |             for i in range(microphone_array.shape[1]):
316 |                 # file path
317 |                 file_path = 'ch'+str(i)+'\\'+str(SNR[snr_index])+'dB\\'+str(int(absorption_value[rt_index][1]))+'ms\\'
318 |                 # clean
319 |                 wavfile.write(
320 |                     MASG_clean_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean[i,:])
321 |                 # clean_noise
322 |                 wavfile.write(
323 |                     MASG_clean_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_noise[i,:])
324 |                 # clean_rever
325 |                 wavfile.write(
326 |                     MASG_clean_rever_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, clean_rever[i,:])
327 |                 # clean_rever_noise
328 |                 wavfile.write(
329 |                     MASG_clean_rever_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_clean_rever_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_clean_rever_noise[i,:])
330 |                 # noise
331 |                 wavfile.write(
332 |                     MASG_noise_path+file_path+file_name[:7]+'_ch'+str(i)+'_noise_'+str(SNR[snr_index])+'dB_'+str(int(absorption_value[rt_index][1]))+'ms.wav', fs, out_noise[i,:])
333 | 
334 |     print(file_name[:7], 'generation completed.', '[', count, '/ 400 ]')
335 |     count = count + 1
336 | 
337 | print('channel number:', microphone_array.shape[1])
338 | print('Microphone array speech and noise has been generated.')
339 | print('\t')
340 | plt.show()
341 | 
342 | 
343 | 


--------------------------------------------------------------------------------
/microphone_array_speech_generator/speech_connection.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | # @Date     : 2019-11-27 09:33:27
 4 | # @Author   : Cheng Rui (chengrui@emails.bjut.edu.cn)
 5 | # @Function : speech connection
 6 | # @Veision  : Release 0.1
 7 | 
 8 | import numpy as np
 9 | from scipy.io import wavfile
10 | import glob
11 | 
12 | # speech connection
13 | def speech_connection(root_path, mic, channels, SNRs, RTs):
14 |     '''
15 |     This function is used to implement speech connection.
16 |     
17 |     Usage:  speech_connection(root_path, mic, channels, SNRs, RTs)
18 | 
19 |         root_path      - address of independent speech and noise
20 |         mic            - the index of speech and noise
21 |         channels       - the index of channel
22 |         SNRs           - the index of SNR
23 |         RTs            - the index of RT60
24 |     
25 |     Example call:
26 |         speech_connection(root_path, mic, channels, SNRs, RTs)
27 |     
28 |     Author: Rui Cheng
29 |     '''
30 | 
31 |     for s in mic:
32 |         for ch in channels:
33 | 	    fileNamesTrain = np.array([])
34 |             for snr in SNRs:
35 |                 for rt in RTs:
36 |                     for file in glob.glob(root_path+s+'\\'+ch+'\\'+snr+'\\'+rt+'\\'+'*.wav'):
37 |                         fileNamesTrain = np.append(fileNamesTrain, file)
38 | 
39 |             print(s, ch, 'have been obtained. There are', fileNamesTrain.shape[0], 'segments in total. Speech connection...')
40 | 			
41 |             trimmed_output_train = np.array([])
42 |             for fileName in fileNamesTrain:
43 |                 Fs, newFile = wavfile.read(fileName)
44 | 		outputFile = newFile / np.max(np.abs(newFile))
45 |                 trimmed_output_train = np.append(trimmed_output_train, outputFile)
46 | 
47 |             # define output as int16 in order to avoid clipping using wavfile.write
48 |             int_array = trimmed_output_train.astype("int16")
49 |             print(int_array, int_array.shape)
50 | 			
51 |             wavfile.write(root_path+s+'\\'+s+'_'+ch+'_4snr_7rt.wav', 16000, int_array)         #TIMIT_577(3)_TRAIN  TIMIT_288(32)_TRAIN
52 |             print ("Connected and output.")
53 | 
54 | print('\n')
55 | print('Speech Connection')
56 | print('==========================================================================================================')
57 | print('\n')
58 | 
59 | # address and index
60 | train_root_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_train\\'
61 | test_root_path = 'C:\\Projects\\chengrui\\Multi_Data\\MASG_mic_speech_test\\'
62 | mic = ['mic_clean', 'mic_clean_noise', 'mic_clean_rever', 'mic_clean_rever_noise', 'mic_noise']
63 | channels = ['ch0', 'ch1', 'ch2', 'ch3', 'ch4', 'ch5', 'ch6', 'ch7', 'ch8', 'ch9']
64 | SNRs = ['-5dB', '0dB', '5dB', '10dB']
65 | RTs = ['200ms', '300ms', '400ms', '500ms', '600ms', '700ms', '800ms']
66 | 
67 | '''# speech connection for test dataset
68 | print('[test dataset]')
69 | speech_connection(test_root_path, mic, channels, SNRs, RTs)
70 | print('test dataset has been connection.')
71 | print('\n')'''
72 | 
73 | # speech connection for train dataset
74 | print('[train dataset]')
75 | speech_connection(train_root_path, mic, channels, SNRs, RTs)
76 | print('train dataset has been connection.')
77 | print('\n')
78 | 


--------------------------------------------------------------------------------
/pdf/MASG.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/pdf/MASG.pdf


--------------------------------------------------------------------------------
/pdf/Pyroomacoustics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vipchengrui/MASG/2eafc9de3d7d2f25a592f4b1c7cb4cdc66b1b89e/pdf/Pyroomacoustics.pdf


--------------------------------------------------------------------------------