├── .DS_Store ├── README.md ├── book ├── 2018_Book_AudioSourceSeparation.pdf └── Audio_Source_Separation_and_Speech_Enhancement.pdf ├── evaluation └── sdr_pesq_sisdr │ ├── bss_eval_sources.m │ ├── composite_pesq │ ├── DC_block.p │ ├── FFTNXCorr.p │ ├── apply_VAD.p │ ├── apply_filter.p │ ├── apply_filters.p │ ├── batch_pesq.p │ ├── batch_pesq2.p │ ├── composite.asv │ ├── composite.m │ ├── convolution_in_timealign.p │ ├── crude_align.p │ ├── enhanced_logmmse.wav │ ├── fix_power_level.p │ ├── id_searchwindows.p │ ├── id_utterances.p │ ├── input_filter.p │ ├── pesq.p │ ├── pesq_debug.p │ ├── pesq_measure.p │ ├── pesq_psychoacoustic_model.p │ ├── pesq_testbench.p │ ├── plot_wav.p │ ├── pow_of.p │ ├── readme.pdf │ ├── readme.txt │ ├── setup_global.p │ ├── sp09.wav │ ├── sp09_babble_sn10.wav │ ├── split_align.p │ ├── time_align.p │ ├── utterance_locate.p │ └── utterance_split.p │ ├── des_file_name.pdf │ ├── eval_sdr.m │ ├── mat_debug.txt │ ├── run.sh │ ├── spk2gender │ ├── spk2gender_cv │ ├── spk2gender_tr │ ├── target_ref_dur.txt │ └── target_ref_dur_backup.txt ├── generation ├── WHAM_and_WHAMR │ ├── wham_scripts.tar.gz │ └── whamr_scripts.tar.gz ├── wsj0-2mix-extr │ ├── mix_2_spk_cv_extr.txt │ ├── mix_2_spk_tr_extr.txt │ ├── mix_2_spk_tt_extr.txt │ └── simulate_2spk_mix.m └── wsj0-2mix │ ├── create-speaker-mixtures.zip │ └── spatialize_wsj0-mix.zip └── slides ├── AVSS_Datasets_PanZexu.pdf ├── Advances_in_end-to-end_neural_source_separation.pdf ├── DeLiangWang_ASRU19.pdf ├── HaizhouLi_CCF.pdf ├── Speech-Separation-Dataset-GM.pdf └── overview-GM.pdf /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Speech Separation and Extraction via Deep Learning 2 | 3 | This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests. 4 | 5 | 6 | ## Table of Contents 7 | 8 | - [Tutorials](#tutorials) 9 | - [Datasets](#datasets) 10 | - [Papers](#papers) 11 | - [Speech Separation based on Brain Studies](#Speech-Separation-based-on-Brain-Studies) 12 | - [Pure Speech Separation](#Pure-Speech-Separation) 13 | - [Multi-Model Speech Separation](#Multi-Model-Speech-Separation) 14 | - [Multi-Channel Speech Separation](#Multi-channel-Speech-Separation) 15 | - [Speaker Extraction](#Speaker-Extraction) 16 | - [Tools](#Tools) 17 | - [System Tool](#System-Tools) 18 | - [Evaluation Tool](#Evaluation-Tools) 19 | - [Results on WSJ0-2mix](#Results-on-WSJ0-2mix) 20 | 21 | 22 | ## Tutorials 23 | 24 | - [Speech Separation, Hung-yi Lee, 2020] [[Video (Subtitle)]](https://www.bilibili.com/video/BV1Cf4y1y7FN?from=search&seid=17392360823608929388) [[Video]](https://www.youtube.com/watch?v=tovg5ZxNgIo&t=8s) [[Slide]](http://speech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/SP%20(v3).pdf) 25 | 26 | - [Advances in End-to-End Neural Source Separation, Yi Luo, 2020] [[Video (BiliBili)]](https://www.bilibili.com/video/BV11T4y1774e) [[Video]](https://www.shenlanxueyuan.com/open/course/62/lesson/57/liveToVideoPreview) [[Slide]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/Advances_in_end-to-end_neural_source_separation.pdf) 27 | 28 | - [Audio Source Separation and Speech Enhancement, Emmanuel Vincent, 2018] [[Book]](https://github.com/gemengtju/Tutorial_Separation/tree/master/book) 29 | 30 | - [Audio Source Separation, Shoji Makino, 2018] [[Book]](https://github.com/gemengtju/Tutorial_Separation/tree/master/book) 31 | 32 | - [Overview Papers] [[Paper (Daniel Michelsanti)]](https://arxiv.org/pdf/2008.09586.pdf) [[Paper (DeLiang Wang)]](https://arxiv.org/ftp/arxiv/papers/1708/1708.07524.pdf) [[Paper (Bo Xu)]](http://www.aas.net.cn/article/zdhxb/2019/2/234) [[Paper (Zafar Rafii)]](https://arxiv.org/pdf/1804.08300.pdf) [[Paper (Sharon Gannot)]](https://hal.inria.fr/hal-01414179v2/document) 33 | 34 | - [Overview Slides] [[Slide (DeLiang Wang)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/DeLiangWang_ASRU19.pdf) [[Slide (Haizhou Li)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/HaizhouLi_CCF.pdf) [[Slide (Meng Ge)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/overview-GM.pdf) 35 | 36 | - [Hand Book] [[Ongoing]](https://www.overleaf.com/read/vhdjwcpyryzr) 37 | 38 | ## Datasets 39 | 40 | - [Dataset Intruduciton] [[Pure Speech Dataset Slide (Meng Ge)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/Speech-Separation-Dataset-GM.pdf) [[Audio-Visual Dataset Slide (Zexu Pan)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/AVSS_Datasets_PanZexu.pdf) 41 | 42 | - [WSJ0] [[Dataset]](https://catalog.ldc.upenn.edu/LDC93S6A) 43 | 44 | - [WSJ0-2mix] [[Script]](https://github.com/gemengtju/Tutorial_Separation/tree/master/generation/wsj0-2mix) 45 | 46 | - [WSJ0-2mix-extr] [[Script]](https://github.com/xuchenglin28/speaker_extraction) 47 | 48 | - [WHAM & WHAMR] [[Paper (WHAM)]](https://arxiv.org/pdf/1907.01160.pdf) [[Paper (WHAMR)]](https://arxiv.org/pdf/1910.10279.pdf) [[Dataset]](http://wham.whisper.ai/) 49 | 50 | - [LibriMix] [[Paper]](https://arxiv.org/pdf/2005.11262.pdf) [[Script]](https://github.com/JorisCos/LibriMix) 51 | 52 | - [LibriCSS] [[Paper]](https://arxiv.org/pdf/2001.11482.pdf) [[Script]](https://github.com/chenzhuo1011/libri_css) 53 | 54 | - [SparseLibriMix] [[Script]](https://github.com/popcornell/SparseLibriMix) 55 | 56 | - [VCTK-2Mix] [[Script]](https://github.com/JorisCos/VCTK-2Mix) 57 | 58 | - [CHIME5 & CHIME6 Challenge] [[Dataset]](https://chimechallenge.github.io/chime6/) 59 | 60 | - [AudioSet] [[Dataset]](https://research.google.com/audioset/download.html) 61 | 62 | - [Microsoft DNS Challenge] [[Dataset]](https://github.com/microsoft/DNS-Challenge) 63 | 64 | - [AVSpeech] [[Dataset]](https://looking-to-listen.github.io/avspeech/download.html) 65 | 66 | - [LRW] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html) 67 | 68 | - [LRS2] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html) 69 | 70 | - [LRS3] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html) [[Script]](https://github.com/JusperLee/LRS3-For-Speech-Separationhttps://github.com/JusperLee/LRS3-For-Speech-Separation) 71 | 72 | - [VoxCeleb] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) 73 | 74 | 75 | ## Papers 76 | 77 | ### Speech Separation based on Brain Studies 78 | 79 | - [Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG, James, Cerebral Cortex 2012] [[Paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481604/pdf/bht355.pdf) 80 | 81 | - [Selective cortical representation of attended speaker in multi-talker speech perception, Nima Mesgarani, Nature 2012] [[Paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3870007/pdf/nihms445767.pdf) 82 | 83 | - [Neural decoding of attentional selection in multi-speaker environments without access to clean sources, James, Journal of Neural Engineering 2017] [[Paper]](https://europepmc.org/article/pmc/pmc5805380#free-full-text) 84 | 85 | - [Speech synthesis from neural decoding of spoken sentences, Gopala K. Anumanchipalli, Nature 2019] [[Paper]](https://www.univie.ac.at/mcogneu/lit/anumanchipalli-19.pdf) 86 | 87 | - [Towards reconstructing intelligible speech from the human auditory cortex, HassanAkbari, Scientific Reports 2019] [[Paper]](https://www.nature.com/articles/s41598-018-37359-z.pdf) [[Code]](http://naplab.ee.columbia.edu/naplib.html) 88 | 89 | ### Pure Speech Separation 90 | 91 | - [Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, Po-Sen Huang, TASLP 2015] [[Paper]](https://arxiv.org/pdf/1502.04149) [[Code (posenhuang)]](https://github.com/posenhuang/deeplearningsourceseparation) 92 | 93 | - [Complex Ratio Masking for Monaural Speech Separation, DS Williamson, TASLP 2015] [[Paper]](https://ieeexplore.ieee.org/abstract/document/7364200/) 94 | 95 | - [Deep clustering: Discriminative embeddings for segmentation and separation, JR Hershey, ICASSP 2016] [[Paper]](https://arxiv.org/abs/1508.04306) [[Code (Kai Li)]](https://github.com/JusperLee/Deep-Clustering-for-Speech-Separation) [[Code (Jian Wu)]](https://github.com/funcwj/deep-clustering) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/wsj0-mix/DeepClustering) 96 | 97 | - [Single-channel multi-speaker separation using deep clustering, Y Isik, Interspeech 2016] [[Paper]](https://arxiv.org/pdf/1607.02173) [[Code (Kai Li)]](https://github.com/JusperLee/Deep-Clustering-for-Speech-Separation) [[Code (Jian Wu)]](https://github.com/funcwj/deep-clustering) 98 | 99 | - [Permutation invariant training of deep models for speaker-independent multi-talker speech separation, Dong Yu, ICASSP 2017] [[Paper]](https://arxiv.org/pdf/1607.00325) [[Code (Kai Li)]](https://github.com/JusperLee/UtterancePIT-Speech-Separation) [[Code (Sining Sun)]](https://github.com/snsun/pit-speech-separation) 100 | 101 | - [Recognizing Multi-talker Speech with Permutation Invariant Training, Dong Yu, ICASSP 2017] [[Paper]](https://arxiv.org/pdf/1704.01985) 102 | 103 | - [Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, M Kolbæk, TASLP 2017] [[Paper]](https://arxiv.org/pdf/1703.06284) [[Code (Kai Li)]](https://github.com/JusperLee/UtterancePIT-Speech-Separation) 104 | 105 | - [Deep attractor network for single-microphone speaker separation, Zhuo Chen, ICASSP 2017] [[Paper]](https://arxiv.org/abs/1611.08930) [[Code (Kai Li)]](https://github.com/JusperLee/DANet-For-Speech-Separation) 106 | 107 | - [Alternative Objective Functions for Deep Clustering, Zhong-Qiu Wang, ICASSP 2018] [[Paper]](http://www.merl.com/publications/docs/TR2018-005.pdf) 108 | 109 | - [Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation, Jing Shi, IJCAI 2018] [[Paper]](https://www.ijcai.org/Proceedings/2018/0605.pdf) 110 | 111 | - [End-to-End Speech Separation with Unfolded Iterative Phase Reconstructioni, Zhong-Qiu Wang et al. 2018] [[Paper]](https://arxiv.org/pdf/1804.10204.pdf) 112 | 113 | - [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment, Jiaming Xu, AAAI 2018] [[Paper]](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16670/15950) [[Code]](https://github.com/jacoxu/ASAM) 114 | 115 | - [Speaker-independent Speech Separation with Deep Attractor Network, Luo Yi, TASLP 2018] [[Paper]](https://arxiv.org/pdf/1707.03634) [[Code (Kai Li)]](https://github.com/JusperLee/DANet-For-Speech-Separation) 116 | 117 | - [Listening to Each Speaker One by One with Recurrent Selective Hearing Networks, Keisuke Kinoshita, ICASSP 2018] [[Paper]](http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0005064.pdf) 118 | 119 | - [Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo Yi, ICASSP 2018] [[Paper]](https://arxiv.org/pdf/1711.00541) [[Code (Kai Li)]](https://github.com/JusperLee/Conv-TasNet) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/whamr/TasNet) 120 | 121 | - [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation, Luo Yi, TASLP 2019] [[Paper]](https://ieeexplore.ieee.org/iel7/6570655/6633080/08707065.pdf) [[Code (Kai Li)]](https://github.com/JusperLee/Conv-TasNet) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/wham/ConvTasNet) 122 | 123 | - [Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation, Yuzhou Liu, TASLP 2019] [[Paper]](https://arxiv.org/pdf/1904.11148) [[Code]](https://github.com/yuzhou-git/deep-casa) [[Code]](https://github.com/yuzhou-git/deep-casa) 124 | 125 | - [Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering, Gene-Ping Yang, Interspeech 2019] [[Paper]](https://arxiv.org/pdf/1904.07845v1.pdf) [[Code]](https://github.com/r06944010/Speech-Separation-TF2) 126 | 127 | - [Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo Yi, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1910.06379) [[Code (Kai Li)]](https://github.com/JusperLee/Dual-Path-RNN-Pytorch) 128 | 129 | - [A comprehensive study of speech separation: spectrogram vs waveform separation, Fahimeh Bahmaninezhad, Interspeech 2019] [[Paper]](https://arxiv.org/pdf/1905.07497) 130 | 131 | - [Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features, Cunhang Fan, Interspeech 2019] [[Paper]](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1940.pdf) 132 | 133 | - [Interrupted and cascaded permutation invariant training for speech separation, Gene-Ping Yang, ICASSP, 2020][[Paper]](https://arxiv.org/abs/1910.12706) 134 | 135 | - [FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks, Liwen Zhang, MMM 2020] [[Paper]](https://arxiv.org/pdf/1902.04891) 136 | 137 | - [Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020] [[Paper]](https://128.84.21.199/abs/1910.10400) 138 | 139 | - [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2003.01531.pdf) [[Demo]](https://enk100.github.io/speaker_separation/) 140 | 141 | - [AN EMPIRICAL STUDY OF CONV-TASNET, Berkan Kadıoglu , Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2002.08688.pdf) [[Code]](https://github.com/JusperLee/Deep-Encoder-Decoder-Conv-TasNet) 142 | 143 | - [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2003.01531.pdf) 144 | 145 | - [Wavesplit: End-to-End Speech Separation by Speaker Clustering, Neil Zeghidour et al. Arxiv 2020 ] [[Paper]](https://arxiv.org/abs/2002.08933) 146 | 147 | - [La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention, Ziqiang Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2001.08998.pdf) 148 | 149 | - [Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method, Cunhang Fan, Arxiv 2020] [[Paper]](https://arxiv.org/abs/2003.07544) 150 | 151 | - [Identify Speakers in Cocktail Parties with End-to-End Attention, Junzhe Zhu, Arxiv 2018] [[Paper]](https://arxiv.org/pdf/2005.11408v1.pdf) [[Code]](https://github.com/JunzheJosephZhu/Identify-Speakers-in-Cocktail-Parties-with-E2E-Attention) 152 | 153 | - [Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals, Jing Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2006.14150.pdf) [[Code/Demo]](https://demotoshow.github.io/) 154 | 155 | - [Speaker-Conditional Chain Model for Speech Separation and Extraction, Jing Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2006.14149.pdf) [[Code/Demo]](https://shincling.github.io/) 156 | 157 | - [Improving Voice Separation by Incorporating End-to-end Speech Recognition, Naoya Takahashi, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1911.12928v2.pdf) [[Code]](https://github.com/pragyak412/Improving-Voice-Separation-by-Incorporating-End-To-End-Speech-Recognition) 158 | 159 | - [A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet, David Ditter, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.11615v2.pdf) [[Code]](https://github.com/sp-uhh/mp-gtf) 160 | 161 | - [Two-Step Sound Source Separation: Training on Learned Latent Targets, Efthymios Tzinis, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.09804v2.pdf) [[Code (Asteroid)]](https://github.com/mpariente/asteroid) [[Code (Tzinis)]](https://github.com/etzinis/two_step_mask_learning) 162 | 163 | - [Unsupervised Sound Separation Using Mixtures of Mixtures, Scott Wisdom, Arxiv] [[Paper]](https://arxiv.org/pdf/2006.12701.pdf) 164 | 165 | - [Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss, Ziqiang Shi, 2020] [[Paper]](https://arxiv.org/pdf/2008.03149.pdf) 166 | 167 | ### Multi-Model Speech Separation 168 | 169 | - [Deep Audio-Visual Learning: A Survey, Hao Zhu, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2001.04758.pdf) 170 | 171 | - [Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks, Jen-Cheng Hou, TETCI 2017] [[Paper]](https://arxiv.org/pdf/1703.10893) [[Code]](https://github.com/avivga/audio-visual-speech-enhancement) 172 | 173 | - [The Sound of Pixels, Hang Zhao, ECCV 2018] [[Paper/Demo]](http://sound-of-pixels.csail.mit.edu/) 174 | 175 | - [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [[Paper]](https://arxiv.org/pdf/1804.01665.pdf) 176 | 177 | - [The Conversation: Deep Audio-Visual Speech Enhancement, Triantafyllos Afouras, Interspeech 2018] [[Paper]](https://arxiv.org/pdf/1804.04121) 178 | 179 | - [End-to-end audiovisual speech recognition, Stavros Petridis, ICASSP 2018] [[Paper]](https://arxiv.org/pdf/1802.06424) [[Code]](https://github.com/mpc001/end-to-end-lipreading) 180 | 181 | - [The Sound of Pixels, Hang Zhao, ECCV 2018] [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Hang_Zhao_The_Sound_of_ECCV_2018_paper.pdf) [[Code]](https://github.com/hangzhaomit/Sound-of-Pixels) 182 | 183 | - [Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation, ARIEL EPHRAT, ACM Transactions on Graphics 2018] [[Paper]](https://arxiv.org/pdf/1804.03619) [[Code]](https://github.com/JusperLee/Looking-to-Listen-at-the-Cocktail-Party) 184 | 185 | - [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ruohan_Gao_Learning_to_Separate_ECCV_2018_paper.pdf) 186 | 187 | - [Time domain audio visual speech separation, Jian Wu, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1904.03760) 188 | 189 | - [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1904.07750.pdf) 190 | 191 | - [Recursive Visual Sound Separation Using Minus-Plus Net, Xudong Xu, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1908.11602.pdf) 192 | 193 | - [The Sound of Motions, Hang Zhao, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1904.05979.pdf) 194 | 195 | - [Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1909.07352) 196 | 197 | - [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Gao_Co-Separating_Sounds_of_Visual_Objects_ICCV_2019_paper.pdf) [[Code]](https://github.com/rhgao/co-separation) 198 | 199 | - [Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments, Giovanni Morrone, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1811.02480v3.pdf) [[Code]](https://github.com/dr-pato/audio_visual_speech_enhancement) 200 | 201 | - [Music Gesture for Visual Sound Separation, Chuang Gao, CVPR 2020] [[Paper]](https://arxiv.org/pdf/2004.09476.pdf) 202 | 203 | - [FaceFilter: Audio-visual speech separation using still images, Soo-Whan Chung, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2005.07074.pdf) 204 | 205 | - [Awesome Audio-Visual, Github, Kranti Kumar Parida] [[Github Link]](https://github.com/krantiparida/awesome-audio-visual) 206 | 207 | ### Multi-channel Speech Separation 208 | 209 | - [FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing, Yi Luo , Arxiv 2019] [[Paper]](https://arxiv.org/abs/1909.13387) 210 | 211 | - [MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition, Xuankai Chang et al., ASRU 2020] [[Paper]](https://arxiv.org/pdf/1910.06522.pdf) 212 | 213 | - [End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation, Yi Luo et al., ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.14104.pdf) [[Code]](https://github.com/yluo42/TAC) 214 | 215 | - [Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning, Rongzhi Guo, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/2003.03927.pdf) 216 | 217 | - [Multi-modal Multi-channel Target Speech Separation, Rongzhi Guo, J-STSP 2020] [[Paper]](https://arxiv.org/pdf/2003.07032.pdf) 218 | 219 | ### Speaker Extraction 220 | 221 | - [Single channel target speaker extraction and recognition with speaker beam, Marc Delcroix, ICASSP 2018] [[Paper]](http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0005554.pdf) 222 | 223 | - [VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking, Quan Wang, INTERSPEECH 2018] [[Paper]](https://arxiv.org/pdf/1810.04826.pdf) [[Code (Jian Wu)]](https://github.com/funcwj/voice-filter) 224 | 225 | - [Single-Channel Speech Extraction Using Speaker Inventory and Attention Network, Xiong Xiao et al, ICASSP 2019] [[Paper]](http://150.162.46.34:8080/icassp2019/ICASSP2019/pdfs/0000086.pdf) 226 | 227 | - [Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss, Chenglin Xu, ICASSP 2019] [[Paper]](https://arxiv.org/pdf/1903.09952.pdf) [[Code]](https://github.com/xuchenglin28/speaker_extraction) 228 | 229 | - [Time-domain speaker extraction network, Chenglin Xu, ASRU 2019] [[Paper]](https://arxiv.org/pdf/2004.14762.pdf) 230 | 231 | - [SpEx: Multi-Scale Time Domain Speaker Extraction Network, Chenglin Xu, TASLP 2020] [[Paper]](https://arxiv.org/pdf/2004.08326.pdf) 232 | 233 | - [Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam, Marc Delcroix, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/2001.08378.pdf) 234 | 235 | - [SpEx+: A Complete Time Domain Speaker Extraction Network, Meng Ge, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2005.04686.pdf) [[Code]](https://github.com/gemengtju/SpEx_Plus/tree/master/nnet) 236 | 237 | 238 | ## Tools 239 | 240 | ### System Tools 241 | 242 | - [Asteroid: the PyTorch-based audio source separation toolkit for researchers, Manuel Pariente et al., ICASSP 2020] [[Tool Link]](https://github.com/mpariente/asteroid) 243 | - [ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration, Chenda Li et al., arxiv] [[Paper Link]](https://arxiv.org/pdf/2011.03706.pdf) 244 | 245 | ### Evaluation Tools 246 | 247 | - [Performance measurement in blind audio sourceseparation, Emmanuel Vincent et al., TASLP 2004] [[Paper]](https://hal.inria.fr/inria-00544230/document) [[Tool Link]](https://github.com/gemengtju/Tutorial_Separation/tree/master/evaluation/sdr_pesq_sisdr) 248 | 249 | - [SDR – Half-baked or Well Done?, Jonathan Le Roux, ICASSP 2019] [[Paper]](https://arxiv.org/pdf/1811.02508) [[Tool Link]](https://github.com/gemengtju/Tutorial_Separation/tree/master/evaluation/sdr_pesq_sisdr) 250 | 251 | 252 | ## Results on WSJ0-2mix 253 | 254 | Speech separation (SS) and speaker extraction (SE) on the WSJ0-2mix (8k, min) dataset. 255 | 256 | | Task | Methods | Model Size | SDRi | SI-SDRi | 257 | | :------------: | :------------: | :------------: | :------------: | :------------: | 258 | | SS | DPCL++ | 13.6M | - | 10.8 | 259 | | SS | uPIT-BLSTM-ST | 92.7M | 10.0 | - | 260 | | SS | DANet | 9.1M | - | 10.5 | 261 | | SS | cuPIT-Grid-RD | 53.2M | 10.2 | - | 262 | | SS | SDC-G-MTL | 53.9M | 10.5 | - | 263 | | SS | CBLDNN-GAT | 39.5M | 11.0 | - | 264 | | SS | Chimera++ | 32.9M | 12.0 | 11.5 | 265 | | SS | WA-MISI-5 | 32.9M | 13.1 | 12.6 | 266 | | SS | BLSTM-TasNet | 23.6M | 13.6 | 13.2 | 267 | | SS | Conv-TasNet | 5.1M | 15.6 | 15.3 | 268 | | SE | SpEx | 10.8M | 17.0 | 16.6 | 269 | | SE | SpEx+ | 11.1M | 17.6 | 17.4 | 270 | | SS | DeepCASA | 12.8M | 18.0 | 17.7 | 271 | | SS | FurcaNeXt | 51.4M | 18.4 | - | 272 | | SS | DPRNN-TasNet | 2.6M | 19.0 | 18.8 | 273 | | SS | Wavesplit | - | 19.2 | 19.0 | 274 | | SS | Wavesplit + Dynamic mixing | - | 20.6 | 20.4 | 275 | -------------------------------------------------------------------------------- /book/2018_Book_AudioSourceSeparation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/book/2018_Book_AudioSourceSeparation.pdf -------------------------------------------------------------------------------- /book/Audio_Source_Separation_and_Speech_Enhancement.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/book/Audio_Source_Separation_and_Speech_Enhancement.pdf -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/bss_eval_sources.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/bss_eval_sources.m -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/DC_block.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/DC_block.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/FFTNXCorr.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/FFTNXCorr.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/apply_VAD.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_VAD.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/apply_filter.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filter.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/apply_filters.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filters.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq2.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq2.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/composite.asv: -------------------------------------------------------------------------------- 1 | function [Csig,Cbak,Covl]= composite(cleanFile, enhancedFile); 2 | 3 | % --------- composite objective measure ---------------------- 4 | % 5 | % Center for Robust Speech Systems 6 | % University of Texas-Dallas 7 | % Copyright (c) 2006 8 | % All Rights Reserved. 9 | % 10 | % Description: 11 | % 12 | % This function implements the composite objective measure 13 | % proposed in [1]. It returns three values: The predicted rating of 14 | % overall quality (Covl), the rating of speech distortion (Csig) and 15 | % the rating of background distortion (Cbak). The ratings use the 1-5 MOS scale. 16 | % In addition, it returns the values of the SNRseg, log-likelihood ratio (LLR), PESQ 17 | % anThe algorithm 18 | 19 | if nargin<2 20 | fprintf('Usage: [Csig,Cbak,Covl]=composite(cleanfile.wav,enhanced.wav)\n'); 21 | fprintf('where ''Csig'' is the predicted rating of speech distortion\n'); 22 | fprintf(' ''Cbak'' is the predicted rating of background distortion\n'); 23 | fprintf(' ''Covl'' is the predicted rating of overall quality.\n\n'); 24 | return; 25 | end 26 | 27 | 28 | alpha= 0.95; 29 | 30 | [data1, Srate1, Nbits1]= wavread(cleanFile); 31 | [data2, Srate2, Nbits2]= wavread(enhancedFile); 32 | if ( Srate1~= Srate2) | ( Nbits1~= Nbits2) 33 | error( 'The two files do not match!\n'); 34 | end 35 | 36 | len= min( length( data1), length( data2)); 37 | data1= data1( 1: len)+eps; 38 | data2= data2( 1: len)+eps; 39 | 40 | 41 | % -- compute the WSS measure --- 42 | % 43 | wss_dist_vec= wss( data1, data2,Srate1); 44 | wss_dist_vec= sort( wss_dist_vec); 45 | wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha))); 46 | 47 | % --- compute the LLR measure --------- 48 | % 49 | LLR_dist= llr( data1, data2,Srate1); 50 | LLRs= sort(LLR_dist); 51 | LLR_len= round( length(LLR_dist)* alpha); 52 | llr_mean= mean( LLRs( 1: LLR_len)); 53 | 54 | % --- compute the SNRseg ---------------- 55 | % 56 | [snr_dist, segsnr_dist]= snr( data1, data2,Srate1); 57 | snr_mean= snr_dist; 58 | segSNR= mean( segsnr_dist); 59 | 60 | 61 | % -- compute the pesq ---- 62 | [pesq_mos]= pesq(cleanFile, enhancedFile); 63 | 64 | 65 | % --- now compute the composite measures ------------------ 66 | % 67 | Csig = 3.093 - 1.029*llr_mean + 0.603*pesq_mos-0.009*wss_dist; 68 | Cbak = 1.634 + 0.478 *pesq_mos - 0.007*wss_dist + 0.063*segSNR; 69 | Covl = 1.594 + 0.805*pesq_mos - 0.512*llr_mean - 0.007*wss_dist; 70 | 71 | fprintf('\n LLR=%f SNRseg=%f WSS=%f PESQ=%f\n',llr_mean,segSNR,wss_dist,pesq_mos); 72 | 73 | return; 74 | 75 | % ---------------------------------------------------------------------- 76 | % 77 | % Weighted Spectral Slope (WSS) Objective Speech Quality Measure 78 | % 79 | % Center for Robust Speech Systems 80 | % University of Texas-Dallas 81 | % Copyright (c) 1998-2006 82 | % All Rights Reserved. 83 | % 84 | % Description: 85 | % 86 | % This function implements the Weighted Spectral Slope (WSS) 87 | % distance measure originally proposed in [1]. The algorithm 88 | % works by first decomposing the speech signal into a set of 89 | % frequency bands (this is done for both the test and reference 90 | % frame). The intensities within each critical band are 91 | % measured. Then, a weighted distances between the measured 92 | % slopes of the log-critical band spectra are computed. 93 | % This measure is also described in Section 2.2.9 (pages 56-58) 94 | % of [2]. 95 | % 96 | % Whereas Klatt's original measure used 36 critical-band 97 | % filters to estimate the smoothed short-time spectrum, this 98 | % implementation considers a bank of 25 filters spanning 99 | % the 4 kHz bandwidth. 100 | % 101 | % Input/Output: 102 | % 103 | % The input is a reference 8kHz sampled speech, and processed 104 | % speech (could be noisy or enhanced). 105 | % 106 | % The function returns the numerical distance between each 107 | % frame of the two input files (one distance per frame). 108 | % 109 | % References: 110 | % 111 | % [1] D. H. Klatt, "Prediction of Perceived Phonetic Distance 112 | % from Critical-Band Spectra: A First Step", Proc. IEEE 113 | % ICASSP'82, Volume 2, pp. 1278-1281, May, 1982. 114 | % 115 | % [2] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, 116 | % Objective Measures of Speech Quality. Prentice Hall 117 | % Advanced Reference Series, Englewood Cliffs, NJ, 1988, 118 | % ISBN: 0-13-629056-6. 119 | % 120 | % Authors: 121 | % 122 | % Bryan L. Pellom and John H. L. Hansen 123 | % 124 | % 125 | % Last Modified: 126 | % 127 | % July 22, 1998 128 | % September 12, 2006 by Philipos Loizou 129 | % ---------------------------------------------------------------------- 130 | 131 | function distortion = wss(clean_speech, processed_speech,sample_rate) 132 | 133 | 134 | % ---------------------------------------------------------------------- 135 | % Check the length of the clean and processed speech. Must be the same. 136 | % ---------------------------------------------------------------------- 137 | 138 | clean_length = length(clean_speech); 139 | processed_length = length(processed_speech); 140 | 141 | if (clean_length ~= processed_length) 142 | disp('Error: Files musthave same length.'); 143 | return 144 | end 145 | 146 | 147 | 148 | % ---------------------------------------------------------------------- 149 | % Global Variables 150 | % ---------------------------------------------------------------------- 151 | 152 | % sample_rate = 8000; % default sample rate 153 | % winlength = 240; % window length in samples 154 | % skiprate = 60; % window skip in samples 155 | winlength = round(30*sample_rate/1000); %240; % window length in samples 156 | skiprate = floor(winlength/4); % window skip in samples 157 | max_freq = sample_rate/2; % maximum bandwidth 158 | num_crit = 25; % number of critical bands 159 | 160 | USE_FFT_SPECTRUM = 1; % defaults to 10th order LP spectrum 161 | %n_fft = 512; % FFT size 162 | n_fft = 2^nextpow2(2*winlength); 163 | n_fftby2 = n_fft/2; % FFT size/2 164 | Kmax = 20; % value suggested by Klatt, pg 1280 165 | Klocmax = 1; % value suggested by Klatt, pg 1280 166 | 167 | % ---------------------------------------------------------------------- 168 | % Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz) 169 | % ---------------------------------------------------------------------- 170 | 171 | cent_freq(1) = 50.0000; bandwidth(1) = 70.0000; 172 | cent_freq(2) = 120.000; bandwidth(2) = 70.0000; 173 | cent_freq(3) = 190.000; bandwidth(3) = 70.0000; 174 | cent_freq(4) = 260.000; bandwidth(4) = 70.0000; 175 | cent_freq(5) = 330.000; bandwidth(5) = 70.0000; 176 | cent_freq(6) = 400.000; bandwidth(6) = 70.0000; 177 | cent_freq(7) = 470.000; bandwidth(7) = 70.0000; 178 | cent_freq(8) = 540.000; bandwidth(8) = 77.3724; 179 | cent_freq(9) = 617.372; bandwidth(9) = 86.0056; 180 | cent_freq(10) = 703.378; bandwidth(10) = 95.3398; 181 | cent_freq(11) = 798.717; bandwidth(11) = 105.411; 182 | cent_freq(12) = 904.128; bandwidth(12) = 116.256; 183 | cent_freq(13) = 1020.38; bandwidth(13) = 127.914; 184 | cent_freq(14) = 1148.30; bandwidth(14) = 140.423; 185 | cent_freq(15) = 1288.72; bandwidth(15) = 153.823; 186 | cent_freq(16) = 1442.54; bandwidth(16) = 168.154; 187 | cent_freq(17) = 1610.70; bandwidth(17) = 183.457; 188 | cent_freq(18) = 1794.16; bandwidth(18) = 199.776; 189 | cent_freq(19) = 1993.93; bandwidth(19) = 217.153; 190 | cent_freq(20) = 2211.08; bandwidth(20) = 235.631; 191 | cent_freq(21) = 2446.71; bandwidth(21) = 255.255; 192 | cent_freq(22) = 2701.97; bandwidth(22) = 276.072; 193 | cent_freq(23) = 2978.04; bandwidth(23) = 298.126; 194 | cent_freq(24) = 3276.17; bandwidth(24) = 321.465; 195 | cent_freq(25) = 3597.63; bandwidth(25) = 346.136; 196 | 197 | bw_min = bandwidth (1); % minimum critical bandwidth 198 | 199 | % ---------------------------------------------------------------------- 200 | % Set up the critical band filters. Note here that Gaussianly shaped 201 | % filters are used. Also, the sum of the filter weights are equivalent 202 | % for each critical band filter. Filter less than -30 dB and set to 203 | % zero. 204 | % ---------------------------------------------------------------------- 205 | 206 | min_factor = exp (-30.0 / (2.0 * 2.303)); % -30 dB point of filter 207 | 208 | for i = 1:num_crit 209 | f0 = (cent_freq (i) / max_freq) * (n_fftby2); 210 | all_f0(i) = floor(f0); 211 | bw = (bandwidth (i) / max_freq) * (n_fftby2); 212 | norm_factor = log(bw_min) - log(bandwidth(i)); 213 | j = 0:1:n_fftby2-1; 214 | crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor); 215 | crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor); 216 | end 217 | 218 | % ---------------------------------------------------------------------- 219 | % For each frame of input speech, calculate the Weighted Spectral 220 | % Slope Measure 221 | % ---------------------------------------------------------------------- 222 | 223 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames 224 | start = 1; % starting sample 225 | window = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1))); 226 | 227 | for frame_count = 1:num_frames 228 | 229 | % ---------------------------------------------------------- 230 | % (1) Get the Frames for the test and reference speech. 231 | % Multiply by Hanning Window. 232 | % ---------------------------------------------------------- 233 | 234 | clean_frame = clean_speech(start:start+winlength-1); 235 | processed_frame = processed_speech(start:start+winlength-1); 236 | clean_frame = clean_frame.*window; 237 | processed_frame = processed_frame.*window; 238 | 239 | % ---------------------------------------------------------- 240 | % (2) Compute the Power Spectrum of Clean and Processed 241 | % ---------------------------------------------------------- 242 | 243 | if (USE_FFT_SPECTRUM) 244 | clean_spec = (abs(fft(clean_frame,n_fft)).^2); 245 | processed_spec = (abs(fft(processed_frame,n_fft)).^2); 246 | else 247 | a_vec = zeros(1,n_fft); 248 | a_vec(1:11) = lpc(clean_frame,10); 249 | clean_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)'; 250 | 251 | a_vec = zeros(1,n_fft); 252 | a_vec(1:11) = lpc(processed_frame,10); 253 | processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)'; 254 | end 255 | 256 | % ---------------------------------------------------------- 257 | % (3) Compute Filterbank Output Energies (in dB scale) 258 | % ---------------------------------------------------------- 259 | 260 | for i = 1:num_crit 261 | clean_energy(i) = sum(clean_spec(1:n_fftby2) ... 262 | .*crit_filter(i,:)'); 263 | processed_energy(i) = sum(processed_spec(1:n_fftby2) ... 264 | .*crit_filter(i,:)'); 265 | end 266 | clean_energy = 10*log10(max(clean_energy,1E-10)); 267 | processed_energy = 10*log10(max(processed_energy,1E-10)); 268 | 269 | % ---------------------------------------------------------- 270 | % (4) Compute Spectral Slope (dB[i+1]-dB[i]) 271 | % ---------------------------------------------------------- 272 | 273 | clean_slope = clean_energy(2:num_crit) - ... 274 | clean_energy(1:num_crit-1); 275 | processed_slope = processed_energy(2:num_crit) - ... 276 | processed_energy(1:num_crit-1); 277 | 278 | % ---------------------------------------------------------- 279 | % (5) Find the nearest peak locations in the spectra to 280 | % each critical band. If the slope is negative, we 281 | % search to the left. If positive, we search to the 282 | % right. 283 | % ---------------------------------------------------------- 284 | 285 | for i = 1:num_crit-1 286 | 287 | % find the peaks in the clean speech signal 288 | 289 | if (clean_slope(i)>0) % search to the right 290 | n = i; 291 | while ((n 0)) 292 | n = n+1; 293 | end 294 | clean_loc_peak(i) = clean_energy(n-1); 295 | else % search to the left 296 | n = i; 297 | while ((n>0) & (clean_slope(n) <= 0)) 298 | n = n-1; 299 | end 300 | clean_loc_peak(i) = clean_energy(n+1); 301 | end 302 | 303 | % find the peaks in the processed speech signal 304 | 305 | if (processed_slope(i)>0) % search to the right 306 | n = i; 307 | while ((n 0)) 308 | n = n+1; 309 | end 310 | processed_loc_peak(i) = processed_energy(n-1); 311 | else % search to the left 312 | n = i; 313 | while ((n>0) & (processed_slope(n) <= 0)) 314 | n = n-1; 315 | end 316 | processed_loc_peak(i) = processed_energy(n+1); 317 | end 318 | 319 | end 320 | 321 | % ---------------------------------------------------------- 322 | % (6) Compute the WSS Measure for this frame. This 323 | % includes determination of the weighting function. 324 | % ---------------------------------------------------------- 325 | 326 | dBMax_clean = max(clean_energy); 327 | dBMax_processed = max(processed_energy); 328 | 329 | % The weights are calculated by averaging individual 330 | % weighting factors from the clean and processed frame. 331 | % These weights W_clean and W_processed should range 332 | % from 0 to 1 and place more emphasis on spectral 333 | % peaks and less emphasis on slope differences in spectral 334 | % valleys. This procedure is described on page 1280 of 335 | % Klatt's 1982 ICASSP paper. 336 | 337 | Wmax_clean = Kmax ./ (Kmax + dBMax_clean - ... 338 | clean_energy(1:num_crit-1)); 339 | Wlocmax_clean = Klocmax ./ ( Klocmax + clean_loc_peak - ... 340 | clean_energy(1:num_crit-1)); 341 | W_clean = Wmax_clean .* Wlocmax_clean; 342 | 343 | Wmax_processed = Kmax ./ (Kmax + dBMax_processed - ... 344 | processed_energy(1:num_crit-1)); 345 | Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ... 346 | processed_energy(1:num_crit-1)); 347 | W_processed = Wmax_processed .* Wlocmax_processed; 348 | 349 | W = (W_clean + W_processed)./2.0; 350 | 351 | distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ... 352 | processed_slope(1:num_crit-1)).^2); 353 | 354 | % this normalization is not part of Klatt's paper, but helps 355 | % to normalize the measure. Here we scale the measure by the 356 | % sum of the weights. 357 | 358 | distortion(frame_count) = distortion(frame_count)/sum(W); 359 | 360 | start = start + skiprate; 361 | 362 | end 363 | 364 | %----------------------------------------------- 365 | function distortion = llr(clean_speech, processed_speech,sample_rate) 366 | 367 | 368 | % ---------------------------------------------------------------------- 369 | % Check the length of the clean and processed speech. Must be the same. 370 | % ---------------------------------------------------------------------- 371 | 372 | clean_length = length(clean_speech); 373 | processed_length = length(processed_speech); 374 | 375 | if (clean_length ~= processed_length) 376 | disp('Error: Both Speech Files must be same length.'); 377 | return 378 | end 379 | 380 | % ---------------------------------------------------------------------- 381 | % Global Variables 382 | % ---------------------------------------------------------------------- 383 | 384 | % sample_rate = 8000; % default sample rate 385 | % winlength = 240; % window length in samples 386 | % skiprate = 60; % window skip in samples 387 | % P = 10; % LPC Analysis Order 388 | winlength = round(30*sample_rate/1000); % window length in samples 389 | skiprate = floor(winlength/4); % window skip in samples 390 | if sample_rate<10000 391 | P = 10; % LPC Analysis Order 392 | else 393 | P=16; % this could vary depending on sampling frequency. 394 | end 395 | 396 | % ---------------------------------------------------------------------- 397 | % For each frame of input speech, calculate the Log Likelihood Ratio 398 | % ---------------------------------------------------------------------- 399 | 400 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames 401 | start = 1; % starting sample 402 | window = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1))); 403 | 404 | for frame_count = 1:num_frames 405 | 406 | % ---------------------------------------------------------- 407 | % (1) Get the Frames for the test and reference speech. 408 | % Multiply by Hanning Window. 409 | % ---------------------------------------------------------- 410 | 411 | clean_frame = clean_speech(start:start+winlength-1); 412 | processed_frame = processed_speech(start:start+winlength-1); 413 | clean_frame = clean_frame.*window; 414 | processed_frame = processed_frame.*window; 415 | 416 | % ---------------------------------------------------------- 417 | % (2) Get the autocorrelation lags and LPC parameters used 418 | % to compute the LLR measure. 419 | % ---------------------------------------------------------- 420 | 421 | [R_clean, Ref_clean, A_clean] = ... 422 | lpcoeff(clean_frame, P); 423 | [R_processed, Ref_processed, A_processed] = ... 424 | lpcoeff(processed_frame, P); 425 | 426 | % ---------------------------------------------------------- 427 | % (3) Compute the LLR measure 428 | % ---------------------------------------------------------- 429 | 430 | numerator = A_processed*toeplitz(R_clean)*A_processed'; 431 | denominator = A_clean*toeplitz(R_clean)*A_clean'; 432 | distortion(frame_count) = log(numerator/denominator); 433 | start = start + skiprate; 434 | 435 | end 436 | 437 | %--------------------------------------------- 438 | function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order) 439 | 440 | % ---------------------------------------------------------- 441 | % (1) Compute Autocorrelation Lags 442 | % ---------------------------------------------------------- 443 | 444 | winlength = max(size(speech_frame)); 445 | for k=1:model_order+1 446 | R(k) = sum(speech_frame(1:winlength-k+1) ... 447 | .*speech_frame(k:winlength)); 448 | end 449 | 450 | % ---------------------------------------------------------- 451 | % (2) Levinson-Durbin 452 | % ---------------------------------------------------------- 453 | 454 | a = ones(1,model_order); 455 | E(1)=R(1); 456 | for i=1:model_order 457 | a_past(1:i-1) = a(1:i-1); 458 | sum_term = sum(a_past(1:i-1).*R(i:-1:2)); 459 | rcoeff(i)=(R(i+1) - sum_term) / E(i); 460 | a(i)=rcoeff(i); 461 | a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1); 462 | E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i); 463 | end 464 | 465 | acorr = R; 466 | refcoeff = rcoeff; 467 | lpparams = [1 -a]; 468 | 469 | 470 | % ---------------------------------------------------------------------- 471 | 472 | function [overall_snr, segmental_snr] = snr(clean_speech, processed_speech,sample_rate) 473 | 474 | % ---------------------------------------------------------------------- 475 | % Check the length of the clean and processed speech. Must be the same. 476 | % ---------------------------------------------------------------------- 477 | 478 | clean_length = length(clean_speech); 479 | processed_length = length(processed_speech); 480 | 481 | if (clean_length ~= processed_length) 482 | disp('Error: Both Speech Files must be same length.'); 483 | return 484 | end 485 | 486 | % ---------------------------------------------------------------------- 487 | % Scale both clean speech and processed speech to have same dynamic 488 | % range. Also remove DC component from each signal 489 | % ---------------------------------------------------------------------- 490 | 491 | %clean_speech = clean_speech - mean(clean_speech); 492 | %processed_speech = processed_speech - mean(processed_speech); 493 | 494 | %processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech))); 495 | 496 | overall_snr = 10* log10( sum(clean_speech.^2)/sum((clean_speech-processed_speech).^2)); 497 | 498 | % ---------------------------------------------------------------------- 499 | % Global Variables 500 | % ---------------------------------------------------------------------- 501 | 502 | % sample_rate = 8000; % default sample rate 503 | % winlength = 240; % window length in samples 504 | % skiprate = 60; % window skip in samples 505 | winlength = round(30*sample_rate/1000); %240; % window length in samples 506 | skiprate = floor(winlength/4); % window skip in samples 507 | MIN_SNR = -10; % minimum SNR in dB 508 | MAX_SNR = 35; % maximum SNR in dB 509 | 510 | % ---------------------------------------------------------------------- 511 | % For each frame of input speech, calculate the Segmental SNR 512 | % ---------------------------------------------------------------------- 513 | 514 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames 515 | start = 1; % starting sample 516 | window = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1))); 517 | 518 | for frame_count = 1: num_frames 519 | 520 | % ---------------------------------------------------------- 521 | % (1) Get the Frames for the test and reference speech. 522 | % Multiply by Hanning Window. 523 | % ---------------------------------------------------------- 524 | 525 | clean_frame = clean_speech(start:start+winlength-1); 526 | processed_frame = processed_speech(start:start+winlength-1); 527 | clean_frame = clean_frame.*window; 528 | processed_frame = processed_frame.*window; 529 | 530 | % ---------------------------------------------------------- 531 | % (2) Compute the Segmental SNR 532 | % ---------------------------------------------------------- 533 | 534 | signal_energy = sum(clean_frame.^2); 535 | noise_energy = sum((clean_frame-processed_frame).^2); 536 | segmental_snr(frame_count) = 10*log10(signal_energy/(noise_energy+eps)+eps); 537 | segmental_snr(frame_count) = max(segmental_snr(frame_count),MIN_SNR); 538 | segmental_snr(frame_count) = min(segmental_snr(frame_count),MAX_SNR); 539 | 540 | start = start + skiprate; 541 | 542 | end 543 | 544 | 545 | 546 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/composite.m: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/composite.m -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/convolution_in_timealign.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/convolution_in_timealign.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/crude_align.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/crude_align.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/enhanced_logmmse.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/enhanced_logmmse.wav -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/fix_power_level.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/fix_power_level.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/id_searchwindows.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/id_searchwindows.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/id_utterances.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/id_utterances.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/input_filter.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/input_filter.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pesq.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pesq_debug.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_debug.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pesq_measure.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_measure.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pesq_psychoacoustic_model.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_psychoacoustic_model.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pesq_testbench.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_testbench.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/plot_wav.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/plot_wav.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/pow_of.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pow_of.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/readme.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/readme.pdf -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/readme.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/readme.txt -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/setup_global.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/setup_global.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/sp09.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/sp09.wav -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/sp09_babble_sn10.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/sp09_babble_sn10.wav -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/split_align.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/split_align.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/time_align.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/time_align.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/utterance_locate.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_locate.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/composite_pesq/utterance_split.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_split.p -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/des_file_name.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/des_file_name.pdf -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/eval_sdr.m: -------------------------------------------------------------------------------- 1 | function eval_sdr(dataset, eval_mix, eval_pesq, model_name, fmix, fclean) 2 | addpath('composite_pesq'); 3 | %% WSJ0_2mix_extr 4 | %mixed_wav_dir = ['/export/home/clx214/data/wsj0_2mix_extr/wav8k/max/' dataset '/' fmix '/']; 5 | %spk1_dir = ['/export/home/clx214/data/wsj0_2mix_extr/wav8k/max/' dataset '/' fclean '/']; 6 | 7 | %% WSJ0_2mix 8 | mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/data/wsj0_2mix_min_mix_6k/']; 9 | spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/data/wsj0_2mix_min_clean_6k/']; 10 | 11 | %% WHAM 12 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAM/data/WHAM_mix_6k/']; 13 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAM/data/WHAM_clean_6k/']; 14 | 15 | % WHAMR reverb 16 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_mix_6k/']; 17 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/']; 18 | 19 | % WHAMR noise 20 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR2/data/WHAMR_noise_mix_6k/']; 21 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/']; % all WHAMR clean data is same 22 | 23 | % WHAMR noise + reverb 24 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR_noise/data/WHAMR_noise_reverb_mix_6k/']; 25 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/']; % all WHAMR clean data is same 26 | 27 | 28 | sprintf('start, %s\n', model_name) 29 | %rec_wav_dir = ['../data/rec/' dataset '/' model_name '/']; 30 | rec_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/rec_aux60/spk1/']; 31 | lists = dir(rec_wav_dir); 32 | len = length(lists) - 2; 33 | SDR = zeros(len, 1); 34 | SIR = zeros(len, 1); 35 | SAR = zeros(len, 1); 36 | SDR_Mix = zeros(len, 1); 37 | SIR_Mix = zeros(len, 1); 38 | SAR_Mix = zeros(len, 1); 39 | PESQ = zeros(len, 1); 40 | PESQ_Mix = zeros(len, 1); 41 | SISDR = zeros(len, 1); 42 | SISDR_Mix = zeros(len, 1); 43 | 44 | target_durs=textscan(fopen('target_ref_dur.txt'), '%s %f'); 45 | 46 | for i = 3:len+2 47 | name = lists(i).name; 48 | part_name = name(1:end-4); 49 | [rec_wav, Fs] = audioread([rec_wav_dir part_name '.wav']); 50 | ori_wav = audioread([spk1_dir part_name '.wav']); 51 | mix_wav = audioread([mixed_wav_dir part_name '.wav']); 52 | 53 | % get ground truth length 54 | utt_tokens = strsplit(part_name, '_'); 55 | idx = find(strcmp(target_durs{1}, utt_tokens{1})); 56 | dur = int32(target_durs{2}(idx)*Fs); 57 | 58 | min_len = min(size(ori_wav, 1), size(rec_wav, 1)); 59 | min_len = int32(min(min_len, dur)); 60 | 61 | rec_wav = rec_wav(1:min_len); 62 | ori_wav = ori_wav(1:min_len); 63 | mix_wav = mix_wav(1:min_len); 64 | 65 | [SDR(i-2),SIR(i-2),SAR(i-2),perm]=bss_eval_sources(rec_wav',ori_wav'); 66 | SISDR(i-2)=cal_SISDR(ori_wav', rec_wav'); 67 | 68 | if eval_pesq 69 | fprintf('PESQINDEX: %d\n', i); 70 | fprintf('PESQINDEX: %s,%s\n', [spk1_dir part_name '.wav'], [rec_wav_dir part_name '.wav']); 71 | PESQ(i-2)=pesq(8000, [spk1_dir part_name '.wav'], [rec_wav_dir part_name '.wav']); 72 | end 73 | 74 | if eval_mix 75 | [SDR_Mix(i-2),SIR_Mix(i-2),SAR_Mix(i-2),perm]=bss_eval_sources(mix_wav',ori_wav'); 76 | SISDR_Mix(i-2)=cal_SISDR(ori_wav', mix_wav'); 77 | 78 | if eval_pesq 79 | fprintf('PESQINDEX_MIX: %d\n', i); 80 | PESQ_Mix(i-2)=pesq(8000, [spk1_dir part_name '.wav'], [mixed_wav_dir part_name '.wav']); 81 | end 82 | end 83 | 84 | if mod(i, 200) == 0 85 | fprintf('the number of sample is evaluated: %d\n', i); 86 | fprintf('%s, %s, target:%d, org:%d, rec:%d\n', part_name, utt_tokens{1}, dur, size(ori_wav,1), size(rec_wav,1)); 87 | end 88 | end 89 | mean_SDR = mean(SDR); 90 | mean_SIR = mean(SIR); 91 | mean_SAR = mean(SAR); 92 | mean_PESQ = mean(PESQ); 93 | mean_SISDR = mean(SISDR); 94 | fprintf('The mean SDR, SIR, SAR, PESQ, SISDR are: %f ,\t %f ,\t %f ,\t %f, \t %f \n', mean_SDR, mean_SIR, mean_SAR, mean_PESQ, mean_SISDR); 95 | if eval_mix 96 | mean_SDR_Mix = mean(SDR_Mix); 97 | mean_SIR_Mix = mean(SIR_Mix); 98 | mean_SAR_Mix = mean(SAR_Mix); 99 | mean_PESQ_Mix = mean(PESQ_Mix); 100 | mean_SISDR_Mix = mean(SISDR_Mix); 101 | fprintf('The mean SDR, SIR, SAR, PESQ, SISDR of mixture are: %f ,\t %f ,\t %f ,\t %f, \t %f \n', mean_SDR_Mix, mean_SIR_Mix, mean_SAR_Mix, mean_PESQ_Mix, mean_SISDR_Mix); 102 | end 103 | 104 | % Calculte different gender case 105 | if dataset == 'cv' 106 | [spk, gender] = textread('spk2gender_cv', '%s%d'); 107 | else 108 | [spk, gender] = textread('spk2gender', '%s%d'); 109 | end 110 | cmm = 1; 111 | cmf = 1; 112 | cff = 1; 113 | csame = 1; 114 | for i = 1:size(SDR, 1) 115 | mix_name = lists(i+2).name; 116 | spk1 = mix_name(1:3); 117 | tmp = regexp(mix_name, '_'); 118 | spk2 = mix_name(tmp(2)+1:tmp(2)+3); 119 | for j = 1:length(spk) 120 | if spk1 == spk{j} 121 | break 122 | end 123 | end 124 | for k = 1:length(spk) 125 | if spk2 == spk{k} 126 | break 127 | end 128 | end 129 | 130 | if gender(k) == 0 & gender(j) == 0 131 | SDR_FF(cff) = SDR(i); 132 | SIR_FF(cff) = SIR(i); 133 | SAR_FF(cff) = SAR(i); 134 | PESQ_FF(cff) = PESQ(i); 135 | 136 | SDR_Same(csame) = SDR(i); 137 | SIR_Same(csame) = SIR(i); 138 | SAR_Same(csame) = SAR(i); 139 | PESQ_Same(csame) = PESQ(i); 140 | 141 | if eval_mix 142 | SDR_FF_Mix(cff) = SDR_Mix(i); 143 | SIR_FF_Mix(cff) = SIR_Mix(i); 144 | SAR_FF_Mix(cff) = SAR_Mix(i); 145 | PESQ_FF_Mix(cff) = PESQ_Mix(i); 146 | 147 | SDR_Same_Mix(csame) = SDR_Mix(i); 148 | SIR_Same_Mix(csame) = SIR_Mix(i); 149 | SAR_Same_Mix(csame) = SAR_Mix(i); 150 | PESQ_Same_Mix(csame) = PESQ_Mix(i); 151 | end 152 | 153 | lists_FF{cff} = lists(i).name; 154 | cff = cff +1; 155 | csame = csame +1; 156 | 157 | elseif gender(k) == 1 & gender(j) == 1 158 | SDR_MM(cmm)= SDR(i); 159 | SIR_MM(cmm)= SIR(i); 160 | SAR_MM(cmm)= SAR(i); 161 | PESQ_MM(cmm) = PESQ(i); 162 | 163 | SDR_Same(csame) = SDR(i); 164 | SIR_Same(csame) = SIR(i); 165 | SAR_Same(csame) = SAR(i); 166 | PESQ_Same(csame) = PESQ(i); 167 | 168 | if eval_mix 169 | SDR_MM_Mix(cmm) = SDR_Mix(i); 170 | SIR_MM_Mix(cmm) = SIR_Mix(i); 171 | SAR_MM_Mix(cmm) = SAR_Mix(i); 172 | PESQ_MM_Mix(cmm) = PESQ_Mix(i); 173 | 174 | SDR_Same_Mix(csame) = SDR_Mix(i); 175 | SIR_Same_Mix(csame) = SIR_Mix(i); 176 | SAR_Same_Mix(csame) = SAR_Mix(i); 177 | PESQ_Same_Mix(csame) = PESQ_Mix(i); 178 | end 179 | 180 | lists_MM{cmm} = lists(i).name; 181 | cmm = cmm + 1; 182 | csame = csame +1; 183 | else 184 | SDR_MF(cmf) = SDR(i); 185 | SIR_MF(cmf) = SIR(i); 186 | SAR_MF(cmf) = SAR(i); 187 | PESQ_MF(cmf) = PESQ(i); 188 | 189 | if eval_mix 190 | SDR_MF_Mix(cmf) = SDR_Mix(i); 191 | SIR_MF_Mix(cmf) = SIR_Mix(i); 192 | SAR_MF_Mix(cmf) = SAR_Mix(i); 193 | PESQ_MF_Mix(cmf) = PESQ_Mix(i); 194 | end 195 | 196 | lists_MF{cmf} = lists(i).name; 197 | cmf = cmf + 1; 198 | end 199 | end 200 | mean_SDR_MF = mean(SDR_MF); 201 | mean_SDR_FF = mean(SDR_FF); 202 | mean_SDR_MM = mean(SDR_MM); 203 | mean_SDR_Same = mean(SDR_Same); 204 | 205 | mean_SIR_MF = mean(SIR_MF); 206 | mean_SIR_FF = mean(SIR_FF); 207 | mean_SIR_MM = mean(SIR_MM); 208 | mean_SIR_Same = mean(SIR_Same); 209 | 210 | mean_SAR_MF = mean(SAR_MF); 211 | mean_SAR_FF = mean(SAR_FF); 212 | mean_SAR_MM = mean(SAR_MM); 213 | mean_SAR_Same = mean(SAR_Same); 214 | 215 | mean_PESQ_MF = mean(PESQ_MF); 216 | mean_PESQ_FF = mean(PESQ_FF); 217 | mean_PESQ_MM = mean(PESQ_MM); 218 | mean_PESQ_Same = mean(PESQ_Same); 219 | 220 | if eval_mix 221 | mean_SDR_MF_Mix = mean(SDR_MF_Mix); 222 | mean_SDR_FF_Mix = mean(SDR_FF_Mix); 223 | mean_SDR_MM_Mix = mean(SDR_MM_Mix); 224 | mean_SDR_Same_Mix = mean(SDR_Same_Mix); 225 | 226 | mean_SIR_MF_Mix = mean(SIR_MF_Mix); 227 | mean_SIR_FF_Mix = mean(SIR_FF_Mix); 228 | mean_SIR_MM_Mix = mean(SIR_MM_Mix); 229 | mean_SIR_Same_Mix = mean(SIR_Same_Mix); 230 | 231 | mean_SAR_MF_Mix = mean(SAR_MF_Mix); 232 | mean_SAR_FF_Mix = mean(SAR_FF_Mix); 233 | mean_SAR_MM_Mix = mean(SAR_MM_Mix); 234 | mean_SAR_Same_Mix = mean(SAR_Same_Mix); 235 | 236 | mean_PESQ_MF_Mix = mean(PESQ_MF_Mix); 237 | mean_PESQ_FF_Mix = mean(PESQ_FF_Mix); 238 | mean_PESQ_MM_Mix = mean(PESQ_MM_Mix); 239 | mean_PESQ_Same_Mix = mean(PESQ_Same_Mix); 240 | end 241 | 242 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Female are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MF, mean_SIR_MF, mean_SAR_MF, mean_PESQ_MF); 243 | fprintf('The mean SDR, SIR, SAR, PEESQ for Female & Female are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_FF, mean_SIR_FF, mean_SAR_FF, mean_PESQ_FF); 244 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Male are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MM, mean_SIR_MM, mean_SAR_MM, mean_PESQ_MM); 245 | fprintf('The mean SDR, SIR, SAR, PESQ for same gender are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_Same, mean_SIR_Same, mean_SAR_Same, mean_PESQ_Same); 246 | 247 | if eval_mix 248 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Female mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MF_Mix, mean_SIR_MF_Mix, mean_SAR_MF_Mix, mean_PESQ_MF_Mix); 249 | fprintf('The mean SDR, SIR, SAR, PEESQ for Female & Female mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_FF_Mix, mean_SIR_FF_Mix, mean_SAR_FF_Mix, mean_PESQ_FF_Mix); 250 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Male mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MM_Mix, mean_SIR_MM_Mix, mean_SAR_MM_Mix, mean_PESQ_MM_Mix); 251 | fprintf('The mean SDR, SIR, SAR, PESQ for same gender mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_Same_Mix, mean_SIR_Same_Mix, mean_SAR_Same_Mix, mean_PESQ_Same_Mix); 252 | end 253 | 254 | if eval_mix 255 | save(['sdr_' model_name '_' dataset '.mat'], 'SDR', 'SIR', 'SAR', 'PESQ', 'SDR_Mix', 'SIR_Mix', 'SAR_Mix', 'PESQ_Mix', 'SISDR', 'SISDR_Mix', 'lists', 'mean_SISDR', 'mean_SISDR_Mix', 'mean_SDR', 'mean_SDR_MF', 'mean_SDR_FF', 'mean_SDR_MM', 'mean_SDR_Same','mean_SIR', 'mean_SIR_MF', 'mean_SIR_FF', 'mean_SIR_MM', 'mean_SIR_Same','mean_SAR', 'mean_SAR_MF', 'mean_SAR_FF', 'mean_SAR_MM', 'mean_SAR_Same', 'mean_PESQ', 'mean_PESQ_MF', 'mean_PESQ_FF', 'mean_PESQ_MM', 'mean_PESQ_Same', 'mean_SDR_Mix', 'mean_SDR_MF_Mix', 'mean_SDR_FF_Mix', 'mean_SDR_MM_Mix', 'mean_SDR_Same_Mix', 'mean_SIR_Mix', 'mean_SIR_MF_Mix', 'mean_SIR_FF_Mix', 'mean_SIR_MM_Mix', 'mean_SIR_Same_Mix', 'mean_SAR_Mix', 'mean_SAR_MF_Mix', 'mean_SAR_FF_Mix', 'mean_SAR_MM_Mix', 'mean_SAR_Same_Mix', 'mean_PESQ_Mix', 'mean_PESQ_MF_Mix', 'mean_PESQ_FF_Mix', 'mean_PESQ_MM_Mix', 'mean_PESQ_Same_Mix'); 256 | else 257 | save(['sdr_' model_name '_' dataset '.mat'], 'SDR', 'SIR', 'SAR', 'PESQ', 'SISDR', 'SDR_MF', 'SDR_FF', 'SDR_MM', 'SDR_Same', 'lists', 'mean_SISDR', 'mean_SDR', 'mean_SDR_MF', 'mean_SDR_FF', 'mean_SDR_MM', 'mean_SDR_Same','mean_SIR', 'mean_SIR_MF', 'mean_SIR_FF', 'mean_SIR_MM', 'mean_SIR_Same','mean_SAR', 'mean_SAR_MF', 'mean_SAR_FF', 'mean_SAR_MM', 'mean_SAR_Same', 'mean_PESQ', 'mean_PESQ_MF', 'mean_PESQ_FF', 'mean_PESQ_MM', 'mean_PESQ_Same'); 258 | end 259 | 260 | end 261 | 262 | function SISDR = cal_SISDR(clean_sig, rec_sig) 263 | clean_sig = clean_sig-mean(clean_sig); 264 | rec_sig = rec_sig-mean(rec_sig); 265 | s_target = dot(rec_sig, clean_sig)*clean_sig/dot(clean_sig, clean_sig); 266 | e_noise = rec_sig - s_target; 267 | SISDR = 10*log10(dot(s_target, s_target)/dot(e_noise, e_noise)); 268 | 269 | end 270 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/mat_debug.txt: -------------------------------------------------------------------------------- 1 | 0.863095 2 | 0.462857 3 | 0.550268 4 | 0.463802 5 | 0.245474 6 | 0.325058 7 | 0.150809 8 | 0.614027 9 | 0.000000 10 | 0.000000 11 | 0.000000 12 | 0.000000 13 | 0.000000 14 | 0.000000 15 | 0.000000 16 | 0.000000 17 | 0.000000 18 | 0.000000 19 | 0.000000 20 | 0.000000 21 | 0.000000 22 | 0.000000 23 | 0.000000 24 | 0.000000 25 | 0.000000 26 | 0.000000 27 | 0.000000 28 | 0.000000 29 | 0.605941 30 | 0.925565 31 | 0.948630 32 | 1.682363 33 | 2.333140 34 | 1.777167 35 | 3.561755 36 | 1.925535 37 | 3.776067 38 | 3.681730 39 | 3.906468 40 | 4.478240 41 | 3.819809 42 | 4.818075 43 | 3.343888 44 | 4.562448 45 | 4.107696 46 | 4.281418 47 | 4.385021 48 | 3.852263 49 | 4.423036 50 | 3.368443 51 | 4.099853 52 | 3.042083 53 | 3.958693 54 | 3.565306 55 | 3.732977 56 | 3.867493 57 | 3.272170 58 | 3.360421 59 | 1.156600 60 | 0.759063 61 | 1.108485 62 | 1.855325 63 | 0.944549 64 | 0.151524 65 | 1.151040 66 | 2.187678 67 | 1.886246 68 | 1.227258 69 | 1.210209 70 | 0.996908 71 | 1.328644 72 | 1.449428 73 | 0.678474 74 | 0.955562 75 | 1.199524 76 | 0.960568 77 | 1.825083 78 | 1.474459 79 | 1.370983 80 | 2.237950 81 | 1.050045 82 | 0.663436 83 | 0.000000 84 | 0.000000 85 | 0.000000 86 | 0.000000 87 | 0.000000 88 | 0.000000 89 | 0.000000 90 | 0.000000 91 | 0.000000 92 | 0.000000 93 | 0.000000 94 | 0.000000 95 | 0.000000 96 | 0.000000 97 | 0.000000 98 | 0.000000 99 | 0.000000 100 | 1.478239 101 | 1.546286 102 | 1.700638 103 | 2.801465 104 | 1.733314 105 | 2.710384 106 | 2.648619 107 | 2.734961 108 | 3.140087 109 | 2.700942 110 | 2.953761 111 | 3.057257 112 | 3.171422 113 | 3.396051 114 | 2.907187 115 | 3.243408 116 | 3.285299 117 | 3.328564 118 | 3.542964 119 | 3.170438 120 | 3.374711 121 | 3.380003 122 | 3.405784 123 | 3.664868 124 | 3.194730 125 | 3.373873 126 | 3.339097 127 | 3.348929 128 | 3.664257 129 | 3.033946 130 | 3.199356 131 | 2.981475 132 | 3.129450 133 | 3.305681 134 | 2.382595 135 | 2.821280 136 | 2.492983 137 | 2.212874 138 | 2.455352 139 | 1.200645 140 | 2.686173 141 | 0.352226 142 | 1.228576 143 | 0.000000 144 | 0.000000 145 | 0.000000 146 | 0.000000 147 | 0.000000 148 | 0.000000 149 | 0.000000 150 | 0.000000 151 | 1.052022 152 | 2.044547 153 | 0.645529 154 | 1.435122 155 | 0.679301 156 | 0.467676 157 | 0.355657 158 | 0.225350 159 | 0.710659 160 | 0.000000 161 | 0.000000 162 | 0.000000 163 | 0.000000 164 | 0.000000 165 | 0.000000 166 | 0.000000 167 | 0.000000 168 | 0.000000 169 | 0.000000 170 | 0.000000 171 | 0.000000 172 | 0.000000 173 | 0.000000 174 | 0.000000 175 | 0.000000 176 | 0.000000 177 | 0.000000 178 | 0.000000 179 | 0.000000 180 | 0.000000 181 | 1.126354 182 | 2.946321 183 | 3.527620 184 | 3.724596 185 | 4.525799 186 | 4.098446 187 | 4.737467 188 | 4.463562 189 | 4.906339 190 | 4.547538 191 | 4.970280 192 | 4.600758 193 | 4.875367 194 | 4.500842 195 | 4.581512 196 | 4.253643 197 | 3.736142 198 | 3.043051 199 | 2.627129 200 | 1.659920 201 | 0.471276 202 | 0.000000 203 | 0.000000 204 | 0.000000 205 | 0.000000 206 | 0.000000 207 | 0.000000 208 | 0.000000 209 | 0.000000 210 | 0.000000 211 | 0.000000 212 | 0.000000 213 | 0.000000 214 | 0.000000 215 | 0.000000 216 | 0.000000 217 | 0.000000 218 | 0.000000 219 | 0.000000 220 | 1.454111 221 | 2.761900 222 | 3.482584 223 | 3.663216 224 | 3.776409 225 | 4.092366 226 | 3.808419 227 | 4.397982 228 | 4.047201 229 | 4.215184 230 | 4.320757 231 | 4.211600 232 | 4.614294 233 | 4.338661 234 | 4.532569 235 | 4.505702 236 | 4.654548 237 | 4.700429 238 | 4.479978 239 | 4.862856 240 | 4.891400 241 | 4.545625 242 | 4.875461 243 | 4.598618 244 | 4.877869 245 | 5.415682 246 | 4.847926 247 | 5.142485 248 | 0.358510 249 | 1.133771 250 | 1.621816 251 | 1.978592 252 | 1.586406 253 | 0.000000 254 | 0.000000 255 | 0.000000 256 | 0.000000 257 | 0.000000 258 | 0.000000 259 | 0.000000 260 | 1.397653 261 | 3.022556 262 | 4.605976 263 | 4.519685 264 | 5.112604 265 | 5.070884 266 | 4.822219 267 | 5.194060 268 | 5.100616 269 | 5.214362 270 | 5.282988 271 | 5.034066 272 | 5.440768 273 | 5.315097 274 | 5.523714 275 | 5.854498 276 | 6.243612 277 | 6.579317 278 | 6.908676 279 | 6.649100 280 | 7.230005 281 | 6.255204 282 | 7.024633 283 | 6.355053 284 | 6.570358 285 | 7.019010 286 | 6.505511 287 | 7.000665 288 | 5.982217 289 | 6.490695 290 | 5.754432 291 | 4.211986 292 | 3.749975 293 | 3.145122 294 | 3.065743 295 | 2.058786 296 | 2.553616 297 | 1.179549 298 | 1.858048 299 | 0.555788 300 | 1.589719 301 | 0.000000 302 | 0.000000 303 | 0.000000 304 | 0.000000 305 | 0.000000 306 | 0.000000 307 | 0.000000 308 | 0.000000 309 | 0.000000 310 | 0.000000 311 | 0.000000 312 | 0.000000 313 | 0.000000 314 | 0.000000 315 | 0.000000 316 | 0.000000 317 | 0.000000 318 | 0.000000 319 | 0.000000 320 | 0.000000 321 | 0.000000 322 | 2.107807 323 | 2.054843 324 | 2.953910 325 | 2.741811 326 | 3.327898 327 | 3.210654 328 | 3.805440 329 | 3.659808 330 | 4.038557 331 | 4.231474 332 | 4.466258 333 | 5.329964 334 | 5.229870 335 | 5.862393 336 | 5.286558 337 | 6.023390 338 | 5.300958 339 | 5.396051 340 | 5.363367 341 | 5.060120 342 | 5.593807 343 | 4.103935 344 | 5.839395 345 | 4.820440 346 | 5.375541 347 | 5.862968 348 | 4.752380 349 | 5.800714 350 | 4.937072 351 | 5.117013 352 | 5.825937 353 | 4.707493 354 | 5.767327 355 | 5.416033 356 | 5.230058 357 | 5.919025 358 | 4.869921 359 | 5.253999 360 | 5.601425 361 | 4.022416 362 | 5.189520 363 | 5.042419 364 | 3.951378 365 | 4.287358 366 | 0.000000 367 | 0.000000 368 | 0.000000 369 | 0.000000 370 | 0.000000 371 | 0.000000 372 | 0.389709 373 | 0.748957 374 | 1.350006 375 | 1.184731 376 | 1.094279 377 | 0.808653 378 | 0.000000 379 | 0.000000 380 | 0.000000 381 | 0.000000 382 | 0.000000 383 | 0.000000 384 | 0.000000 385 | 0.000000 386 | 0.000000 387 | 0.000000 388 | 0.000000 389 | 0.000000 390 | 0.000000 391 | 0.000000 392 | 0.000000 393 | 0.000000 394 | 0.000000 395 | 0.000000 396 | 0.000000 397 | 0.000000 398 | 0.000000 399 | 0.000000 400 | 0.000000 401 | 0.000000 402 | 0.000000 403 | 0.000000 404 | 0.935910 405 | 1.758160 406 | 1.628309 407 | 2.411204 408 | 2.960818 409 | 2.316287 410 | 3.308113 411 | 2.871104 412 | 2.474566 413 | 3.774521 414 | 2.134365 415 | 3.296041 416 | 3.435712 417 | 2.650713 418 | 3.060803 419 | 2.601130 420 | 1.704484 421 | 2.454674 422 | 0.000000 423 | 0.000000 424 | 0.000000 425 | 0.000000 426 | 0.000000 427 | 0.000000 428 | 0.000000 429 | 0.000000 430 | 0.000000 431 | 0.000000 432 | 0.000000 433 | 0.000000 434 | 0.000000 435 | 0.687932 436 | 0.926301 437 | 2.354633 438 | 3.880248 439 | 2.669980 440 | 4.352959 441 | 5.321133 442 | 4.539225 443 | 5.643332 444 | 5.086752 445 | 5.216498 446 | 6.131239 447 | 4.408493 448 | 5.594543 449 | 5.524803 450 | 4.460885 451 | 5.863668 452 | 4.405786 453 | 4.323978 454 | 4.688123 455 | 3.702257 456 | 3.546547 457 | 2.118505 458 | 0.914533 459 | 0.000000 460 | 0.000000 461 | 0.000000 462 | 0.000000 463 | 0.000000 464 | 0.000000 465 | 0.000000 466 | 0.000000 467 | 0.000000 468 | 0.000000 469 | 0.000000 470 | 0.000000 471 | 0.000000 472 | 0.000000 473 | 0.000000 474 | 0.000000 475 | 0.000000 476 | 0.000000 477 | 0.000000 478 | 0.000000 479 | 0.000000 480 | 0.000000 481 | 0.000000 482 | 0.000000 483 | 0.000000 484 | 0.000000 485 | 0.000000 486 | 0.000000 487 | 0.000000 488 | 0.000000 489 | 0.000000 490 | 0.000000 491 | 0.000000 492 | 0.000000 493 | 0.000000 494 | 0.000000 495 | 0.000000 496 | 0.000000 497 | 0.000000 498 | 0.000000 499 | 0.000000 500 | 0.000000 501 | 0.000000 502 | 0.000000 503 | 0.000000 504 | 0.000000 505 | 0.000000 506 | 0.000000 507 | 0.000000 508 | 0.000000 509 | 0.000000 510 | 0.000000 511 | 0.000000 512 | 0.000000 513 | 0.000000 514 | 0.000000 515 | 0.000000 516 | 0.000000 517 | 0.000000 518 | 0.000000 519 | 0.000000 520 | 0.000000 521 | 0.000000 522 | 0.000000 523 | 0.000000 524 | 0.000000 525 | 0.000000 526 | 0.000000 527 | 0.000000 528 | 0.000000 529 | 0.000000 530 | 0.000000 531 | 0.000000 532 | 0.000000 533 | 0.000000 534 | 0.000000 535 | 0.000000 536 | 0.000000 537 | 0.000000 538 | 0.000000 539 | 0.000000 540 | 0.000000 541 | 0.000000 542 | 0.000000 543 | 0.000000 544 | 0.000000 545 | 0.000000 546 | 0.000000 547 | 0.000000 548 | 0.000000 549 | 0.000000 550 | 0.000000 551 | 0.000000 552 | 0.000000 553 | 0.000000 554 | 0.000000 555 | 0.000000 556 | 0.000000 557 | 0.000000 558 | 0.000000 559 | 0.000000 560 | 0.000000 561 | 0.000000 562 | 0.000000 563 | 0.000000 564 | 0.000000 565 | 0.000000 566 | 0.000000 567 | 0.000000 568 | 0.000000 569 | 0.000000 570 | 0.000000 571 | 0.000000 572 | 0.000000 573 | 0.000000 574 | 0.000000 575 | 0.000000 576 | 0.000000 577 | 0.000000 578 | 0.000000 579 | 0.000000 580 | 0.000000 581 | 0.000000 582 | 0.000000 583 | 0.000000 584 | 0.000000 585 | 0.000000 586 | 0.000000 587 | 0.000000 588 | 0.000000 589 | 0.000000 590 | 0.000000 591 | 0.000000 592 | 0.000000 593 | 0.000000 594 | 0.000000 595 | 0.000000 596 | 0.000000 597 | 0.000000 598 | 0.000000 599 | 0.000000 600 | 0.000000 601 | 0.000000 602 | 0.000000 603 | 0.000000 604 | 0.000000 605 | 0.000000 606 | 0.000000 607 | 0.000000 608 | 0.000000 609 | 0.000000 610 | 0.000000 611 | 0.000000 612 | 0.000000 613 | 0.000000 614 | 0.000000 615 | 0.000000 616 | 0.000000 617 | 0.000000 618 | 0.000000 619 | 0.000000 620 | 0.000000 621 | 0.000000 622 | 0.000000 623 | 0.000000 624 | 0.000000 625 | 0.000000 626 | 0.000000 627 | 0.000000 628 | 0.000000 629 | 0.000000 630 | 0.000000 631 | 0.000000 632 | 0.000000 633 | 0.000000 634 | 0.000000 635 | 0.000000 636 | 0.000000 637 | 0.000000 638 | 0.000000 639 | 0.000000 640 | 0.000000 641 | 0.000000 642 | 0.000000 643 | 0.000000 644 | 0.000000 645 | 0.000000 646 | 0.000000 647 | 0.000000 648 | 0.000000 649 | 0.000000 650 | 0.000000 651 | 0.000000 652 | 0.000000 653 | 0.000000 654 | 0.000000 655 | 0.000000 656 | 0.000000 657 | 0.000000 658 | 0.000000 659 | 0.000000 660 | 0.000000 661 | 0.000000 662 | 0.000000 663 | 0.000000 664 | 0.000000 665 | 0.000000 666 | 0.000000 667 | 0.000000 668 | 0.000000 669 | 0.000000 670 | 0.000000 671 | 0.000000 672 | 0.000000 673 | 0.000000 674 | 0.000000 675 | 0.000000 676 | 0.000000 677 | 0.000000 678 | 0.000000 679 | 0.000000 680 | 0.000000 681 | 0.000000 682 | 0.000000 683 | 0.000000 684 | 0.000000 685 | 0.000000 686 | 0.000000 687 | 0.000000 688 | 0.000000 689 | 0.000000 690 | 0.000000 691 | 0.000000 692 | 0.000000 693 | 0.000000 694 | 0.000000 695 | 0.000000 696 | 0.000000 697 | 0.000000 698 | 0.000000 699 | 0.000000 700 | 0.000000 701 | 0.000000 702 | 0.000000 703 | 0.000000 704 | 0.000000 705 | 0.000000 706 | 0.000000 707 | 0.000000 708 | 0.000000 709 | 0.000000 710 | 0.000000 711 | 0.000000 712 | 0.000000 713 | 0.000000 714 | 0.000000 715 | 0.000000 716 | 0.000000 717 | 0.000000 718 | 0.000000 719 | 0.000000 720 | 0.000000 721 | 0.000000 722 | 0.000000 723 | 0.000000 724 | 0.000000 725 | 0.000000 726 | 0.000000 727 | 0.000000 728 | 0.000000 729 | 0.000000 730 | 0.000000 731 | 0.000000 732 | 0.000000 733 | 0.000000 734 | 0.000000 735 | 0.000000 736 | 0.000000 737 | 0.000000 738 | 0.000000 739 | 0.000000 740 | 0.000000 741 | 0.000000 742 | 0.000000 743 | 0.000000 744 | 0.000000 745 | 0.000000 746 | 0.000000 747 | 0.000000 748 | 0.000000 749 | 0.000000 750 | 0.000000 751 | 0.000000 752 | 0.000000 753 | 0.000000 754 | 0.000000 755 | 0.000000 756 | 0.000000 757 | 0.000000 758 | 0.000000 759 | 0.000000 760 | 0.000000 761 | 0.000000 762 | 0.000000 763 | 0.000000 764 | 0.000000 765 | 0.000000 766 | 0.000000 767 | 0.000000 768 | 0.000000 769 | 0.000000 770 | 0.000000 771 | 0.000000 772 | 0.000000 773 | 0.000000 774 | 0.000000 775 | 0.000000 776 | 0.000000 777 | 0.000000 778 | 0.000000 779 | 0.000000 780 | 0.000000 781 | 0.000000 782 | 0.000000 783 | 0.000000 784 | 0.000000 785 | 0.000000 786 | 0.000000 787 | 0.000000 788 | 0.000000 789 | 0.000000 790 | 0.000000 791 | 0.000000 792 | 0.000000 793 | 0.000000 794 | 0.000000 795 | 0.000000 796 | 0.000000 797 | 0.000000 798 | 0.000000 799 | 0.000000 800 | 0.000000 801 | 0.000000 802 | 0.000000 803 | 0.000000 804 | 0.000000 805 | 0.000000 806 | 0.000000 807 | 0.000000 808 | 0.000000 809 | 0.000000 810 | 0.000000 811 | 0.000000 812 | 0.000000 813 | 0.000000 814 | 0.000000 815 | 0.000000 816 | 0.000000 817 | 0.000000 818 | 0.000000 819 | 0.000000 820 | 0.000000 821 | 0.000000 822 | 0.000000 823 | 0.000000 824 | 0.000000 825 | 0.000000 826 | 0.000000 827 | 0.000000 828 | 0.000000 829 | 0.000000 830 | 0.000000 831 | 0.000000 832 | 0.000000 833 | 0.000000 834 | 0.000000 835 | 0.000000 836 | 0.000000 837 | 0.000000 838 | 0.000000 839 | 0.000000 840 | 0.000000 841 | 0.000000 842 | 0.000000 843 | 0.000000 844 | 0.000000 845 | 0.000000 846 | 0.000000 847 | 0.000000 848 | 0.000000 849 | 0.000000 850 | 0.000000 851 | 0.000000 852 | 0.000000 853 | 0.000000 854 | 0.000000 855 | 0.000000 856 | 0.000000 857 | 0.000000 858 | 0.000000 859 | 0.000000 860 | 0.000000 861 | 0.000000 862 | 0.000000 863 | 0.000000 864 | 0.000000 865 | 0.000000 866 | 0.000000 867 | 0.000000 868 | 0.000000 869 | 0.000000 870 | 0.000000 871 | 0.000000 872 | 0.000000 873 | 0.000000 874 | 0.000000 875 | 0.000000 876 | 0.000000 877 | 0.000000 878 | 0.000000 879 | 0.000000 880 | 0.000000 881 | 0.000000 882 | 0.000000 883 | 0.000000 884 | 0.000000 885 | 0.000000 886 | 0.000000 887 | 0.000000 888 | 0.000000 889 | 0.000000 890 | 0.000000 891 | 0.000000 892 | 0.000000 893 | 0.000000 894 | 0.000000 895 | 0.000000 896 | 0.000000 897 | 0.000000 898 | 0.000000 899 | 0.000000 900 | 0.000000 901 | 0.000000 902 | 0.000000 903 | 0.000000 904 | 0.000000 905 | 0.000000 906 | 0.000000 907 | 0.000000 908 | 0.000000 909 | 0.000000 910 | 0.000000 911 | 0.000000 912 | 0.000000 913 | 0.000000 914 | 0.000000 915 | 0.000000 916 | 0.000000 917 | 0.000000 918 | 0.000000 919 | 0.000000 920 | 0.000000 921 | 0.000000 922 | 0.000000 923 | 0.000000 924 | 0.000000 925 | 0.000000 926 | 0.000000 927 | 0.000000 928 | 0.000000 929 | 0.000000 930 | 0.000000 931 | 0.000000 932 | 0.000000 933 | 0.000000 934 | 0.000000 935 | 0.000000 936 | 0.000000 937 | 0.000000 938 | 0.000000 939 | 0.000000 940 | 0.000000 941 | 0.000000 942 | 0.000000 943 | 0.000000 944 | 0.000000 945 | 0.000000 946 | 0.000000 947 | 0.000000 948 | 0.000000 949 | 0.000000 950 | 0.000000 951 | 0.000000 952 | 0.000000 953 | 0.000000 954 | 0.000000 955 | 0.000000 956 | 0.000000 957 | 0.000000 958 | 0.000000 959 | 0.000000 960 | 0.000000 961 | 0.000000 962 | 0.000000 963 | 0.000000 964 | 0.000000 965 | 0.000000 966 | 0.000000 967 | 0.000000 968 | 0.000000 969 | 0.000000 970 | 0.000000 971 | 0.000000 972 | 0.000000 973 | 0.000000 974 | 0.000000 975 | 0.000000 976 | 0.000000 977 | 0.000000 978 | 0.000000 979 | 0.000000 980 | 0.000000 981 | 0.000000 982 | 0.000000 983 | 0.000000 984 | 0.000000 985 | 0.000000 986 | 0.000000 987 | 0.000000 988 | 0.000000 989 | 0.000000 990 | 0.000000 991 | 0.000000 992 | 0.000000 993 | 0.000000 994 | 0.000000 995 | 0.000000 996 | 0.000000 997 | 0.000000 998 | 0.000000 999 | 0.000000 1000 | 0.000000 1001 | 0.000000 1002 | 0.000000 1003 | 0.000000 1004 | 0.000000 1005 | 0.000000 1006 | 0.000000 1007 | 0.000000 1008 | 0.000000 1009 | 0.000000 1010 | 0.000000 1011 | 0.000000 1012 | 0.000000 1013 | 0.000000 1014 | 0.000000 1015 | 0.000000 1016 | 0.000000 1017 | 0.000000 1018 | 0.000000 1019 | 0.000000 1020 | 0.000000 1021 | 0.000000 1022 | 0.000000 1023 | 0.000000 1024 | 0.000000 1025 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/run.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | # Copyright 2017 4 | # Author: Chenglin Xu (NTU, Singapore) 5 | # Email: xuchenglin28@gmail.com 6 | # Updated by Chenglin, Dec 2018 7 | 8 | /export/home/clx214/Matlab_R2014A/bin/matlab -nodesktop -nosplash -r "eval_sdr('tt', 0, 0, 'Ext_mfcc_Mix_N256_L20_1L80_2L160_S10_B256_H512_P3_X8_R4_C2_gln_si-sdr_sigmoid_deconv_BLSTM_e400_spk0.2_mscmo_a0.1_b0.1', 'mix', 's1')" 9 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/spk2gender: -------------------------------------------------------------------------------- 1 | 050 0 2 | 051 1 3 | 052 1 4 | 053 0 5 | 22g 1 6 | 22h 1 7 | 420 0 8 | 421 0 9 | 422 1 10 | 423 1 11 | 440 1 12 | 441 0 13 | 442 0 14 | 443 1 15 | 444 0 16 | 445 0 17 | 446 1 18 | 447 1 19 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/spk2gender_cv: -------------------------------------------------------------------------------- 1 | 011 0 2 | 012 1 3 | 013 1 4 | 014 0 5 | 015 1 6 | 016 0 7 | 017 0 8 | 018 0 9 | 019 0 10 | 01a 0 11 | 01b 0 12 | 01c 0 13 | 01d 0 14 | 01e 1 15 | 01f 0 16 | 01g 1 17 | 01i 1 18 | 01j 0 19 | 01k 0 20 | 01l 1 21 | 01m 0 22 | 01n 0 23 | 01o 0 24 | 01p 0 25 | 01q 0 26 | 01r 1 27 | 01s 1 28 | 01t 1 29 | 01u 0 30 | 01v 0 31 | 01w 1 32 | 01x 0 33 | 01y 1 34 | 01z 1 35 | 020 1 36 | 021 1 37 | 022 0 38 | 023 0 39 | 024 1 40 | 025 1 41 | 026 1 42 | 027 0 43 | 028 0 44 | 029 1 45 | 02a 0 46 | 02b 1 47 | 02c 0 48 | 02d 0 49 | 02e 0 50 | 204 0 51 | 205 0 52 | 206 0 53 | 207 1 54 | 208 1 55 | 209 0 56 | 20a 0 57 | 20b 0 58 | 20c 1 59 | 20d 0 60 | 20e 0 61 | 20f 1 62 | 20g 1 63 | 20h 0 64 | 20i 1 65 | 20j 1 66 | 20k 1 67 | 20l 1 68 | 20m 1 69 | 20n 1 70 | 20o 1 71 | 20p 0 72 | 20q 1 73 | 20r 1 74 | 20s 1 75 | 20t 0 76 | 20u 1 77 | 20v 1 78 | 401 0 79 | 403 1 80 | 404 0 81 | 405 1 82 | 406 1 83 | 407 0 84 | 408 1 85 | 409 0 86 | 40a 1 87 | 40b 1 88 | 40c 1 89 | 40d 1 90 | 40e 0 91 | 40f 1 92 | 40g 0 93 | 40h 0 94 | 40i 1 95 | 40j 1 96 | 40k 1 97 | 40l 0 98 | 40m 0 99 | 40n 1 100 | 40o 0 101 | 40p 0 102 | -------------------------------------------------------------------------------- /evaluation/sdr_pesq_sisdr/spk2gender_tr: -------------------------------------------------------------------------------- 1 | 011 2 | 012 3 | 013 4 | 014 5 | 015 6 | 016 7 | 017 8 | 018 9 | 019 10 | 01a 11 | 01b 12 | 01c 13 | 01d 14 | 01e 15 | 01f 16 | 01g 17 | 01i 18 | 01j 19 | 01k 20 | 01l 21 | 01m 22 | 01n 23 | 01o 24 | 01p 25 | 01q 26 | 01r 27 | 01s 28 | 01t 29 | 01u 30 | 01v 31 | 01w 32 | 01x 33 | 01y 34 | 01z 35 | 020 36 | 021 37 | 022 38 | 023 39 | 024 40 | 025 41 | 026 42 | 027 43 | 028 44 | 029 45 | 02a 46 | 02b 47 | 02c 48 | 02d 49 | 02e 50 | 204 51 | 205 52 | 206 53 | 207 54 | 208 55 | 209 56 | 20a 57 | 20b 58 | 20c 59 | 20d 60 | 20e 61 | 20f 62 | 20g 63 | 20h 64 | 20i 65 | 20j 66 | 20k 67 | 20l 68 | 20m 69 | 20n 70 | 20o 71 | 20p 72 | 20q 73 | 20r 74 | 20s 75 | 20t 76 | 20u 77 | 20v 78 | 401 79 | 403 80 | 404 81 | 405 82 | 406 83 | 407 84 | 408 85 | 409 86 | 40a 87 | 40b 88 | 40c 89 | 40d 90 | 40e 91 | 40f 92 | 40g 93 | 40h 94 | 40i 95 | 40j 96 | 40k 97 | 40l 98 | 40m 99 | 40n 100 | 40o 101 | 40p 102 | -------------------------------------------------------------------------------- /generation/WHAM_and_WHAMR/wham_scripts.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/WHAM_and_WHAMR/wham_scripts.tar.gz -------------------------------------------------------------------------------- /generation/WHAM_and_WHAMR/whamr_scripts.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/WHAM_and_WHAMR/whamr_scripts.tar.gz -------------------------------------------------------------------------------- /generation/wsj0-2mix-extr/simulate_2spk_mix.m: -------------------------------------------------------------------------------- 1 | function simulate_2spk_mix(data_type, wsj0root, output_dir, fs8k, min_max) 2 | % Simulate 2-speaker mixture data for speaker extraction. 3 | % Call: 4 | % simulate_2spk_mix(data_type, wsj0root, output_dir, fs8k, min_max) 5 | % e.g., simulate_2spk_mix('tt', '/media/clx214/data/wsj/', '/media/clx214/data/wsj0_2mix_extr_tmp/wav8k', 8000, 'max') 6 | % Paras: 7 | % data_type: data set to generate, (tr|cv|tt), e.g., 'tt' 8 | % wsj0root: YOUR_PATH/, the folder containing converted wsj0/, e.g., '/media/clx214/data/wsj/' 9 | % output_dir: the folder to save simulated data for extraction, e.g., '/media/clx214/data/wsj0_2mix_extr_tmp/wav8k' 10 | % fs8k: sampling rate of the simulated data, e.g., 8000 11 | % min_max: get the mininium or maximum wav length, when simulating mixture data, e.g, 'max' 12 | % 13 | % The code is based on "create_wav_2speakers_extr.m" from "http://www.merl.com/demos/deep-clustering" 14 | % 15 | % 1. Assume that WSJ0's wv1 sphere files is converted to wav files. The folder 16 | % structure and file name are kept same under wsj0/, e.g., 17 | % ORG_PATH/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and 18 | % stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wv1. 19 | % Relevant data ('si_tr_s', 'si_dt_05' and 'si_et_05') are under YOUR_PATH/wsj0/ 20 | % 2. Put 'voicebox' toolbox in current folder. (http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html) 21 | % 3. Set your 'YOUR_PATH' and 'OUTPUT_PATH' properly, then run this script in Matlab. 22 | % (The max lenght of the wav will be kept when generate the mixture. The sampling rate will be 8kHz.) 23 | % 24 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 25 | % Copyright (C) 2016 Mitsubishi Electric Research Labs 26 | % (Jonathan Le Roux, John R. Hershey, Zhuo Chen) 27 | % Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0) 28 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 29 | % 30 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 31 | %Copyright 2018 Chenglin Xu, Nanyang Technological University, Singapore 32 | % 33 | %Licensed under the Apache License, Version 2.0 (the "License"); 34 | %you may not use this file except in compliance with the License. 35 | %You may obtain a copy of the License at 36 | % 37 | % http://www.apache.org/licenses/LICENSE-2.0 38 | % 39 | %Unless required by applicable law or agreed to in writing, software 40 | %distributed under the License is distributed on an "AS IS" BASIS, 41 | %WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 42 | %See the License for the specific language governing permissions and 43 | %limitations under the License. 44 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 45 | 46 | if ~exist([output_dir '/' min_max '/' data_type],'dir') 47 | mkdir([output_dir '/' min_max '/' data_type]); 48 | end 49 | mkdir([output_dir '/' min_max '/' data_type '/s1/']); 50 | mkdir([output_dir '/' min_max '/' data_type '/aux/']); 51 | mkdir([output_dir '/' min_max '/' data_type '/mix/']); 52 | 53 | TaskFile = ['mix_2_spk_' data_type '_extr.txt']; 54 | fid = fopen(TaskFile,'r'); 55 | C = textscan(fid,'%s %f %s %f %s'); 56 | num_files = length(C{1}); 57 | 58 | fprintf(1,'Start to generate data for %s\n', [min_max '_' data_type]); 59 | for i = 1:num_files 60 | [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i}); 61 | [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i}); 62 | [inwav_aux_dir,invwav_aux_name,inwav_aux_ext] = fileparts(C{5}{i}); 63 | 64 | inwav1_snr = C{2}(i); 65 | inwav2_snr = C{4}(i); 66 | mix_name = [invwav1_name,'_',num2str(inwav1_snr),'_',invwav2_name,'_',num2str(inwav2_snr),'_',invwav_aux_name]; 67 | 68 | % get input wavs 69 | [s1, fs] = audioread([wsj0root C{1}{i}]); 70 | s2 = audioread([wsj0root C{3}{i}]); 71 | s_aux = audioread([wsj0root C{5}{i}]); 72 | 73 | % resample, normalize to 8 kHz file 74 | s1_8k = resample(s1,fs8k,fs); 75 | [s1_8k,lev1] = activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev); 76 | s2_8k = resample(s2,fs8k,fs); 77 | [s2_8k,lev2] = activlev(s2_8k,fs8k,'n'); 78 | s_aux_8k = resample(s_aux,fs8k,fs); 79 | [s_aux_8k,lev_aux] = activlev(s_aux_8k,fs8k,'n'); 80 | 81 | weight_1 = 10^(inwav1_snr/20); 82 | weight_2 = 10^(inwav2_snr/20); 83 | 84 | s1_8k = weight_1 * s1_8k; 85 | s2_8k = weight_2 * s2_8k; 86 | 87 | switch min_max 88 | case 'max' 89 | mix_8k_length = max(length(s1_8k),length(s2_8k)); 90 | s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1)); 91 | s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1)); 92 | case 'min' 93 | mix_8k_length = min(length(s1_8k),length(s2_8k)); 94 | s1_8k = s1_8k(1:mix_8k_length); 95 | s2_8k = s2_8k(1:mix_8k_length); 96 | end 97 | mix_8k = s1_8k + s2_8k; 98 | 99 | max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:)),abs(s_aux_8k(:)))); 100 | mix_scaling_8k = 1/max_amp_8k*0.9; 101 | s1_8k = mix_scaling_8k * s1_8k; 102 | mix_8k = mix_scaling_8k * mix_8k; 103 | s_aux_8k = mix_scaling_8k * s_aux_8k; 104 | 105 | audiowrite([output_dir '/' min_max '/' data_type '/s1/' mix_name '.wav'],s1_8k,fs8k); 106 | audiowrite([output_dir '/' min_max '/' data_type '/aux/' mix_name '.wav'],s_aux_8k,fs8k); 107 | audiowrite([output_dir '/' min_max '/' data_type '/mix/' mix_name '.wav'],mix_8k,fs8k); 108 | end 109 | fclose(fid); 110 | fprintf(1,'End of generating data for %s\n', [min_max '_' data_type]); 111 | end 112 | -------------------------------------------------------------------------------- /generation/wsj0-2mix/create-speaker-mixtures.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/wsj0-2mix/create-speaker-mixtures.zip -------------------------------------------------------------------------------- /generation/wsj0-2mix/spatialize_wsj0-mix.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/wsj0-2mix/spatialize_wsj0-mix.zip -------------------------------------------------------------------------------- /slides/AVSS_Datasets_PanZexu.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/AVSS_Datasets_PanZexu.pdf -------------------------------------------------------------------------------- /slides/Advances_in_end-to-end_neural_source_separation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/Advances_in_end-to-end_neural_source_separation.pdf -------------------------------------------------------------------------------- /slides/DeLiangWang_ASRU19.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/DeLiangWang_ASRU19.pdf -------------------------------------------------------------------------------- /slides/HaizhouLi_CCF.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/HaizhouLi_CCF.pdf -------------------------------------------------------------------------------- /slides/Speech-Separation-Dataset-GM.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/Speech-Separation-Dataset-GM.pdf -------------------------------------------------------------------------------- /slides/overview-GM.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/overview-GM.pdf --------------------------------------------------------------------------------