├── .gitignore ├── Dockerfile ├── README.md ├── __init__.py ├── decode_model.py ├── install_model.sh ├── main_denoising.py ├── main_get_vad.py ├── model ├── global_mvn_stats.mat ├── speech_enhancement.model0 └── speech_enhancement.model1 ├── run_eval.sh └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/* 2 | \#* 3 | *.pyc 4 | *~ 5 | 6 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # Starting from the official CNTK docker image (based on 2 | # Ubuntu-16.04) 3 | FROM nvidia/cuda:10.1-cudnn8-runtime-ubuntu18.04 4 | 5 | # Install packages. 6 | RUN apt-get update && \ 7 | apt-get install -y --no-install-recommends \ 8 | g++ gfortran \ 9 | openmpi-bin \ 10 | libsndfile-dev \ 11 | software-properties-common \ 12 | emacs && \ 13 | rm -rf /var/lib/apt/lists/* 14 | RUN ln -s /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.20 /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.1 && \ 15 | ln -s /usr/lib/x86_64-linux-gnu/libmpi.so.20.10.1 /usr/lib/x86_64-linux-gnu/libmpi.so.12 && \ 16 | ldconfig 17 | 18 | # Install Python 3.6. 19 | RUN add-apt-repository ppa:deadsnakes/ppa 20 | RUN apt-get update && \ 21 | apt-get install -y python3.6 python3-pip 22 | RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 0 23 | 24 | # Install Python packages. 25 | RUN pip3 install --upgrade pip && \ 26 | pip3 install numpy scipy librosa joblib webrtcvad wurlitzer cntk-gpu 27 | 28 | 29 | # Copy the repository inside the docker in /dihard18 30 | WORKDIR /dihard18 31 | COPY . . 32 | 33 | # Install model. 34 | RUN ./install_model.sh 35 | 36 | # Make the eval script executable 37 | RUN chmod +x ./run_eval.sh 38 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A quick-use package for speech enhancement based on our DIHARD18 system 2 | Original founder: @staplesinLA 3 | 4 | Major contributor: @nryant @mmmaat(many thanks!) 5 | 6 | The repository provides tools to reproduce the enhancement results of the 7 | speech preprocessing part of our DIHARD18 system[1]. The deep-learning based 8 | denoising model is trained on 400 hours of English and Mandarin audio; for full 9 | details see [1,2,3]. Currently the tools accept 16 kHz, 16-bit monochannel 10 | WAV files. Please convert the audio format in advance. 11 | 12 | Additionally, this package integrates a voice activity detection (VAD) module 13 | based on [py-webrtcvad](https://github.com/wiseman/py-webrtcvad), which provides a Python interface to the 14 | [WebRTC](https://webrtc.org/) VAD. The default parameters are tuned on the 15 | development set of DIHARD18. 16 | 17 | [1] Sun, Lei, et al. "Speaker Diarization with Enhancing Speech for the 18 | First DIHARD Challenge." Proc. Interspeech 2018 (2018): 19 | 2793-2797. [PDF](http://home.ustc.edu.cn/~sunlei17/pdf/lei_IS2018.pdf) 20 | 21 | [2] Gao, Tian, et al. "Densely connected progressive learning for 22 | lstm-based speech enhancement." 2018 IEEE International Conference on 23 | Acoustics, Speech and Signal Processing 24 | (ICASSP). IEEE, 2018. [PDF](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8461861) 25 | 26 | [3] Sun, Lei, et al. "Multiple-target deep learning for LSTM-RNN based 27 | speech enhancement." 2017 Hands-free Speech Communications and 28 | Microphone Arrays (HSCMA). IEEE, 29 | 2017. [PDF](http://home.ustc.edu.cn/~sunlei17/pdf/MULTIPLE-TARGET.pdf) 30 | 31 | 32 | ## Main Prerequisites 33 | 34 | * [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/setup-linux-python?tabs=cntkpy26) 35 | * [webrtcvad](https://github.com/wiseman/py-webrtcvad) 36 | * [Numpy](https://github.com/numpy/numpy) 37 | * [Scipy](https://github.com/scipy/scipy) 38 | * [Librosa](https://github.com/librosa/librosa) 39 | * [Wurlitzer](https://github.com/minrk/wurlitzer) 40 | * [joblib](https://github.com/joblib/joblib) 41 | 42 | ## How to use it? 43 | 44 | 1. Install all dependencies (Note that you need to have Python and pip 45 | already installed on your system) : 46 | 47 | sudo apt-get install openmpi-bin 48 | pip install numpy scipy librosa 49 | pip install cntk-gpu 50 | pip install webrtcvad 51 | pip install wurlitzer 52 | pip install joblib 53 | 54 | Make sure the CNTK engine installed successfully by querying its version: 55 | 56 | python -c "import cntk; print(cntk.__version__)" 57 | 58 | 2. Download the speech enhancement repository : 59 | 60 | git clone https://github.com/staplesinLA/denoising_DIHARD18.git 61 | 62 | 3. Install the pretrained model: 63 | 64 | cd denoising_DIHARD18 65 | ./install_model.sh 66 | 67 | 4. Specify parameters in ``run_eval.sh``: 68 | 69 | * For the speech enhancement tool: 70 | 71 | WAV_DIR= 72 | SE_WAV_DIR= 73 | USE_GPU= 74 | GPU_DEVICE_ID= 75 | TRUNCATE_MINUTES=