├── .DS_Store
├── README.md
├── book
    ├── 2018_Book_AudioSourceSeparation.pdf
    └── Audio_Source_Separation_and_Speech_Enhancement.pdf
├── evaluation
    └── sdr_pesq_sisdr
    │   ├── bss_eval_sources.m
    │   ├── composite_pesq
    │       ├── DC_block.p
    │       ├── FFTNXCorr.p
    │       ├── apply_VAD.p
    │       ├── apply_filter.p
    │       ├── apply_filters.p
    │       ├── batch_pesq.p
    │       ├── batch_pesq2.p
    │       ├── composite.asv
    │       ├── composite.m
    │       ├── convolution_in_timealign.p
    │       ├── crude_align.p
    │       ├── enhanced_logmmse.wav
    │       ├── fix_power_level.p
    │       ├── id_searchwindows.p
    │       ├── id_utterances.p
    │       ├── input_filter.p
    │       ├── pesq.p
    │       ├── pesq_debug.p
    │       ├── pesq_measure.p
    │       ├── pesq_psychoacoustic_model.p
    │       ├── pesq_testbench.p
    │       ├── plot_wav.p
    │       ├── pow_of.p
    │       ├── readme.pdf
    │       ├── readme.txt
    │       ├── setup_global.p
    │       ├── sp09.wav
    │       ├── sp09_babble_sn10.wav
    │       ├── split_align.p
    │       ├── time_align.p
    │       ├── utterance_locate.p
    │       └── utterance_split.p
    │   ├── des_file_name.pdf
    │   ├── eval_sdr.m
    │   ├── mat_debug.txt
    │   ├── run.sh
    │   ├── spk2gender
    │   ├── spk2gender_cv
    │   ├── spk2gender_tr
    │   ├── target_ref_dur.txt
    │   └── target_ref_dur_backup.txt
├── generation
    ├── WHAM_and_WHAMR
    │   ├── wham_scripts.tar.gz
    │   └── whamr_scripts.tar.gz
    ├── wsj0-2mix-extr
    │   ├── mix_2_spk_cv_extr.txt
    │   ├── mix_2_spk_tr_extr.txt
    │   ├── mix_2_spk_tt_extr.txt
    │   └── simulate_2spk_mix.m
    └── wsj0-2mix
    │   ├── create-speaker-mixtures.zip
    │   └── spatialize_wsj0-mix.zip
└── slides
    ├── AVSS_Datasets_PanZexu.pdf
    ├── Advances_in_end-to-end_neural_source_separation.pdf
    ├── DeLiangWang_ASRU19.pdf
    ├── HaizhouLi_CCF.pdf
    ├── Speech-Separation-Dataset-GM.pdf
    └── overview-GM.pdf


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/.DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Speech Separation and Extraction via Deep Learning
  2 | 
  3 | This repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests. 
  4 | 
  5 | 
  6 | ## Table of Contents
  7 | 
  8 | - [Tutorials](#tutorials)
  9 | - [Datasets](#datasets)
 10 | - [Papers](#papers)
 11 |     - [Speech Separation based on Brain Studies](#Speech-Separation-based-on-Brain-Studies)
 12 |     - [Pure Speech Separation](#Pure-Speech-Separation)
 13 |     - [Multi-Model Speech Separation](#Multi-Model-Speech-Separation)
 14 |     - [Multi-Channel Speech Separation](#Multi-channel-Speech-Separation)
 15 |     - [Speaker Extraction](#Speaker-Extraction)
 16 | - [Tools](#Tools)
 17 |     - [System Tool](#System-Tools)
 18 |     - [Evaluation Tool](#Evaluation-Tools)
 19 | - [Results on WSJ0-2mix](#Results-on-WSJ0-2mix)
 20 | 
 21 | 
 22 | ## Tutorials
 23 | 
 24 | - [Speech Separation, Hung-yi Lee, 2020] [[Video (Subtitle)]](https://www.bilibili.com/video/BV1Cf4y1y7FN?from=search&seid=17392360823608929388) [[Video]](https://www.youtube.com/watch?v=tovg5ZxNgIo&t=8s) [[Slide]](http://speech.ee.ntu.edu.tw/~tlkagk/courses/DLHLP20/SP%20(v3).pdf)
 25 | 
 26 | - [Advances in End-to-End Neural Source Separation, Yi Luo, 2020] [[Video (BiliBili)]](https://www.bilibili.com/video/BV11T4y1774e) [[Video]](https://www.shenlanxueyuan.com/open/course/62/lesson/57/liveToVideoPreview) [[Slide]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/Advances_in_end-to-end_neural_source_separation.pdf)
 27 | 
 28 | - [Audio Source Separation and Speech Enhancement, Emmanuel Vincent, 2018] [[Book]](https://github.com/gemengtju/Tutorial_Separation/tree/master/book)
 29 | 
 30 | - [Audio Source Separation, Shoji Makino, 2018] [[Book]](https://github.com/gemengtju/Tutorial_Separation/tree/master/book)
 31 | 
 32 | - [Overview Papers] [[Paper (Daniel Michelsanti)]](https://arxiv.org/pdf/2008.09586.pdf) [[Paper (DeLiang Wang)]](https://arxiv.org/ftp/arxiv/papers/1708/1708.07524.pdf) [[Paper (Bo Xu)]](http://www.aas.net.cn/article/zdhxb/2019/2/234) [[Paper (Zafar Rafii)]](https://arxiv.org/pdf/1804.08300.pdf) [[Paper (Sharon Gannot)]](https://hal.inria.fr/hal-01414179v2/document)
 33 | 
 34 | - [Overview Slides] [[Slide (DeLiang Wang)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/DeLiangWang_ASRU19.pdf) [[Slide (Haizhou Li)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/HaizhouLi_CCF.pdf) [[Slide (Meng Ge)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/overview-GM.pdf)
 35 | 
 36 | - [Hand Book] [[Ongoing]](https://www.overleaf.com/read/vhdjwcpyryzr)
 37 | 
 38 | ## Datasets
 39 | 
 40 | - [Dataset Intruduciton] [[Pure Speech Dataset Slide (Meng Ge)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/Speech-Separation-Dataset-GM.pdf) [[Audio-Visual Dataset Slide (Zexu Pan)]](https://github.com/gemengtju/Tutorial_Separation/blob/master/slides/AVSS_Datasets_PanZexu.pdf)
 41 | 
 42 | - [WSJ0] [[Dataset]](https://catalog.ldc.upenn.edu/LDC93S6A)
 43 | 
 44 | - [WSJ0-2mix] [[Script]](https://github.com/gemengtju/Tutorial_Separation/tree/master/generation/wsj0-2mix)
 45 | 
 46 | - [WSJ0-2mix-extr] [[Script]](https://github.com/xuchenglin28/speaker_extraction)
 47 | 
 48 | - [WHAM & WHAMR] [[Paper (WHAM)]](https://arxiv.org/pdf/1907.01160.pdf) [[Paper (WHAMR)]](https://arxiv.org/pdf/1910.10279.pdf) [[Dataset]](http://wham.whisper.ai/)
 49 | 
 50 | - [LibriMix] [[Paper]](https://arxiv.org/pdf/2005.11262.pdf) [[Script]](https://github.com/JorisCos/LibriMix)
 51 | 
 52 | - [LibriCSS] [[Paper]](https://arxiv.org/pdf/2001.11482.pdf) [[Script]](https://github.com/chenzhuo1011/libri_css)
 53 | 
 54 | - [SparseLibriMix] [[Script]](https://github.com/popcornell/SparseLibriMix)
 55 | 
 56 | - [VCTK-2Mix] [[Script]](https://github.com/JorisCos/VCTK-2Mix)
 57 | 
 58 | - [CHIME5 & CHIME6 Challenge] [[Dataset]](https://chimechallenge.github.io/chime6/)
 59 | 
 60 | - [AudioSet] [[Dataset]](https://research.google.com/audioset/download.html)
 61 | 
 62 | - [Microsoft DNS Challenge] [[Dataset]](https://github.com/microsoft/DNS-Challenge)
 63 | 
 64 | - [AVSpeech] [[Dataset]](https://looking-to-listen.github.io/avspeech/download.html)
 65 | 
 66 | - [LRW] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html)
 67 | 
 68 | - [LRS2] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html)
 69 | 
 70 | - [LRS3] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html) [[Script]](https://github.com/JusperLee/LRS3-For-Speech-Separationhttps://github.com/JusperLee/LRS3-For-Speech-Separation)
 71 | 
 72 | - [VoxCeleb] [[Dataset]](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
 73 | 
 74 | 
 75 | ## Papers
 76 | 
 77 | ### Speech Separation based on Brain Studies
 78 | 
 79 | - [Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG, James, Cerebral Cortex 2012] [[Paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481604/pdf/bht355.pdf)
 80 | 
 81 | - [Selective cortical representation of attended speaker in multi-talker speech perception, Nima Mesgarani, Nature 2012] [[Paper]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3870007/pdf/nihms445767.pdf)
 82 | 
 83 | - [Neural decoding of attentional selection in multi-speaker environments without access to clean sources, James, Journal of Neural Engineering 2017] [[Paper]](https://europepmc.org/article/pmc/pmc5805380#free-full-text)
 84 | 
 85 | - [Speech synthesis from neural decoding of spoken sentences, Gopala K. Anumanchipalli, Nature 2019] [[Paper]](https://www.univie.ac.at/mcogneu/lit/anumanchipalli-19.pdf)
 86 | 
 87 | - [Towards reconstructing intelligible speech from the human auditory cortex, HassanAkbari, Scientific Reports 2019] [[Paper]](https://www.nature.com/articles/s41598-018-37359-z.pdf) [[Code]](http://naplab.ee.columbia.edu/naplib.html)
 88 | 
 89 | ### Pure Speech Separation
 90 | 
 91 | - [Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation, Po-Sen Huang, TASLP 2015] [[Paper]](https://arxiv.org/pdf/1502.04149) [[Code (posenhuang)]](https://github.com/posenhuang/deeplearningsourceseparation)
 92 | 
 93 | - [Complex Ratio Masking for Monaural Speech Separation, DS Williamson, TASLP 2015] [[Paper]](https://ieeexplore.ieee.org/abstract/document/7364200/)
 94 | 
 95 | - [Deep clustering: Discriminative embeddings for segmentation and separation, JR Hershey,  ICASSP 2016] [[Paper]](https://arxiv.org/abs/1508.04306) [[Code (Kai Li)]](https://github.com/JusperLee/Deep-Clustering-for-Speech-Separation) [[Code (Jian Wu)]](https://github.com/funcwj/deep-clustering) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/wsj0-mix/DeepClustering)
 96 | 
 97 | - [Single-channel multi-speaker separation using deep clustering, Y Isik, Interspeech 2016] [[Paper]](https://arxiv.org/pdf/1607.02173) [[Code (Kai Li)]](https://github.com/JusperLee/Deep-Clustering-for-Speech-Separation) [[Code (Jian Wu)]](https://github.com/funcwj/deep-clustering)
 98 | 
 99 | - [Permutation invariant training of deep models for speaker-independent multi-talker speech separation, Dong Yu, ICASSP 2017] [[Paper]](https://arxiv.org/pdf/1607.00325) [[Code (Kai Li)]](https://github.com/JusperLee/UtterancePIT-Speech-Separation) [[Code (Sining Sun)]](https://github.com/snsun/pit-speech-separation)
100 | 
101 | - [Recognizing Multi-talker Speech with Permutation Invariant Training, Dong Yu, ICASSP 2017] [[Paper]](https://arxiv.org/pdf/1704.01985)
102 | 
103 | - [Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks, M Kolbæk, TASLP 2017] [[Paper]](https://arxiv.org/pdf/1703.06284) [[Code (Kai Li)]](https://github.com/JusperLee/UtterancePIT-Speech-Separation)
104 | 
105 | - [Deep attractor network for single-microphone speaker separation, Zhuo Chen, ICASSP 2017] [[Paper]](https://arxiv.org/abs/1611.08930) [[Code (Kai Li)]](https://github.com/JusperLee/DANet-For-Speech-Separation)
106 | 
107 | - [Alternative Objective Functions for Deep Clustering, Zhong-Qiu Wang, ICASSP 2018] [[Paper]](http://www.merl.com/publications/docs/TR2018-005.pdf)
108 | 
109 | - [Listen, Think and Listen Again: Capturing Top-down Auditory Attention for Speaker-independent Speech Separation, Jing Shi, IJCAI 2018] [[Paper]](https://www.ijcai.org/Proceedings/2018/0605.pdf) 
110 | 
111 | - [End-to-End Speech Separation with Unfolded Iterative Phase Reconstructioni, Zhong-Qiu Wang et al. 2018] [[Paper]](https://arxiv.org/pdf/1804.10204.pdf)
112 | 
113 | - [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment, Jiaming Xu, AAAI 2018] [[Paper]](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16670/15950) [[Code]](https://github.com/jacoxu/ASAM)
114 | 
115 | - [Speaker-independent Speech Separation with Deep Attractor Network, Luo Yi, TASLP 2018] [[Paper]](https://arxiv.org/pdf/1707.03634) [[Code (Kai Li)]](https://github.com/JusperLee/DANet-For-Speech-Separation)
116 | 
117 | - [Listening to Each Speaker One by One with Recurrent Selective Hearing Networks, Keisuke Kinoshita, ICASSP 2018] [[Paper]](http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0005064.pdf)
118 | 
119 | - [Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo Yi, ICASSP 2018] [[Paper]](https://arxiv.org/pdf/1711.00541) [[Code (Kai Li)]](https://github.com/JusperLee/Conv-TasNet) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/whamr/TasNet)
120 | 
121 | - [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation, Luo Yi, TASLP 2019] [[Paper]](https://ieeexplore.ieee.org/iel7/6570655/6633080/08707065.pdf) [[Code (Kai Li)]](https://github.com/JusperLee/Conv-TasNet) [[Code (asteroid)]](https://github.com/mpariente/asteroid/blob/master/egs/wham/ConvTasNet)
122 | 
123 | - [Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation, Yuzhou Liu, TASLP 2019] [[Paper]](https://arxiv.org/pdf/1904.11148) [[Code]](https://github.com/yuzhou-git/deep-casa) [[Code]](https://github.com/yuzhou-git/deep-casa)
124 | 
125 | - [Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering, Gene-Ping Yang, Interspeech 2019] [[Paper]](https://arxiv.org/pdf/1904.07845v1.pdf) [[Code]](https://github.com/r06944010/Speech-Separation-TF2)
126 | 
127 | - [Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo Yi, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1910.06379) [[Code (Kai Li)]](https://github.com/JusperLee/Dual-Path-RNN-Pytorch)
128 | 
129 | - [A comprehensive study of speech separation: spectrogram vs waveform separation, Fahimeh Bahmaninezhad, Interspeech 2019] [[Paper]](https://arxiv.org/pdf/1905.07497)
130 | 
131 | - [Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features, Cunhang Fan, Interspeech 2019] [[Paper]](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1940.pdf)
132 | 
133 | - [Interrupted and cascaded permutation invariant training for speech separation, Gene-Ping Yang, ICASSP, 2020][[Paper]](https://arxiv.org/abs/1910.12706)
134 | 
135 | - [FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks, Liwen Zhang, MMM 2020] [[Paper]](https://arxiv.org/pdf/1902.04891)
136 | 
137 | - [Filterbank design for end-to-end speech separation, Manuel Pariente et al., ICASSP 2020] [[Paper]](https://128.84.21.199/abs/1910.10400)
138 | 
139 | - [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2003.01531.pdf) [[Demo]](https://enk100.github.io/speaker_separation/)
140 | 
141 | - [AN EMPIRICAL STUDY OF CONV-TASNET, Berkan Kadıoglu , Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2002.08688.pdf) [[Code]](https://github.com/JusperLee/Deep-Encoder-Decoder-Conv-TasNet)
142 | 
143 | - [Voice Separation with an Unknown Number of Multiple Speakers, Eliya Nachmani, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2003.01531.pdf)
144 | 
145 | - [Wavesplit: End-to-End Speech Separation by Speaker Clustering, Neil Zeghidour et al. Arxiv 2020 ] [[Paper]](https://arxiv.org/abs/2002.08933)
146 | 
147 | - [La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention, Ziqiang Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2001.08998.pdf)
148 | 
149 | - [Deep Attention Fusion Feature for Speech Separation with End-to-End Post-ﬁlter Method, Cunhang Fan, Arxiv 2020] [[Paper]](https://arxiv.org/abs/2003.07544)
150 | 
151 | - [Identify Speakers in Cocktail Parties with End-to-End Attention, Junzhe Zhu, Arxiv 2018] [[Paper]](https://arxiv.org/pdf/2005.11408v1.pdf) [[Code]](https://github.com/JunzheJosephZhu/Identify-Speakers-in-Cocktail-Parties-with-E2E-Attention)
152 | 
153 | - [Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals, Jing Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2006.14150.pdf) [[Code/Demo]](https://demotoshow.github.io/)
154 | 
155 | - [Speaker-Conditional Chain Model for Speech Separation and Extraction, Jing Shi, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2006.14149.pdf) [[Code/Demo]](https://shincling.github.io/)
156 | 
157 | - [Improving Voice Separation by Incorporating End-to-end Speech Recognition, Naoya Takahashi, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1911.12928v2.pdf) [[Code]](https://github.com/pragyak412/Improving-Voice-Separation-by-Incorporating-End-To-End-Speech-Recognition)
158 | 
159 | - [A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet, David Ditter, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.11615v2.pdf) [[Code]](https://github.com/sp-uhh/mp-gtf)
160 | 
161 | - [Two-Step Sound Source Separation: Training on Learned Latent Targets, Efthymios Tzinis, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.09804v2.pdf) [[Code (Asteroid)]](https://github.com/mpariente/asteroid) [[Code (Tzinis)]](https://github.com/etzinis/two_step_mask_learning)
162 | 
163 | - [Unsupervised Sound Separation Using Mixtures of Mixtures, Scott Wisdom, Arxiv] [[Paper]](https://arxiv.org/pdf/2006.12701.pdf)
164 | 
165 | - [Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss, Ziqiang Shi, 2020] [[Paper]](https://arxiv.org/pdf/2008.03149.pdf)
166 | 
167 | ### Multi-Model Speech Separation
168 | 
169 | - [Deep Audio-Visual Learning: A Survey, Hao Zhu, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2001.04758.pdf)
170 | 
171 | - [Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks, Jen-Cheng Hou, TETCI 2017] [[Paper]](https://arxiv.org/pdf/1703.10893) [[Code]](https://github.com/avivga/audio-visual-speech-enhancement)
172 | 
173 | - [The Sound of Pixels, Hang Zhao, ECCV 2018] [[Paper/Demo]](http://sound-of-pixels.csail.mit.edu/)
174 | 
175 | - [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [[Paper]](https://arxiv.org/pdf/1804.01665.pdf)
176 | 
177 | - [The Conversation: Deep Audio-Visual Speech Enhancement, Triantafyllos Afouras, Interspeech 2018] [[Paper]](https://arxiv.org/pdf/1804.04121)
178 | 
179 | - [End-to-end audiovisual speech recognition, Stavros Petridis, ICASSP 2018] [[Paper]](https://arxiv.org/pdf/1802.06424) [[Code]](https://github.com/mpc001/end-to-end-lipreading)
180 | 
181 | - [The Sound of Pixels, Hang Zhao, ECCV 2018] [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Hang_Zhao_The_Sound_of_ECCV_2018_paper.pdf) [[Code]](https://github.com/hangzhaomit/Sound-of-Pixels)
182 | 
183 | - [Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation, ARIEL EPHRAT, ACM Transactions on Graphics 2018] [[Paper]](https://arxiv.org/pdf/1804.03619) [[Code]](https://github.com/JusperLee/Looking-to-Listen-at-the-Cocktail-Party)
184 | 
185 | - [Learning to Separate Object Sounds by Watching Unlabeled Video, Ruohan Gao, ECCV 2018] [[Paper]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ruohan_Gao_Learning_to_Separate_ECCV_2018_paper.pdf)
186 | 
187 | - [Time domain audio visual speech separation, Jian Wu, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1904.03760)
188 | 
189 | - [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1904.07750.pdf)
190 | 
191 | - [Recursive Visual Sound Separation Using Minus-Plus Net, Xudong Xu, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1908.11602.pdf)
192 | 
193 | - [The Sound of Motions, Hang Zhao, ICCV 2019] [[Paper]](https://arxiv.org/pdf/1904.05979.pdf)
194 | 
195 | - [Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network, Ke Tan, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1909.07352)
196 | 
197 | - [Co-Separating Sounds of Visual Objects, Ruohan Gao, ICCV 2019] [[Paper]](http://openaccess.thecvf.com/content_ICCV_2019/papers/Gao_Co-Separating_Sounds_of_Visual_Objects_ICCV_2019_paper.pdf) [[Code]](https://github.com/rhgao/co-separation)
198 | 
199 | - [Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments, Giovanni Morrone, Arxiv 2019] [[Paper]](https://arxiv.org/pdf/1811.02480v3.pdf) [[Code]](https://github.com/dr-pato/audio_visual_speech_enhancement)
200 | 
201 | - [Music Gesture for Visual Sound Separation, Chuang Gao, CVPR 2020] [[Paper]](https://arxiv.org/pdf/2004.09476.pdf)
202 | 
203 | - [FaceFilter: Audio-visual speech separation using still images, Soo-Whan Chung, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2005.07074.pdf)
204 | 
205 | - [Awesome Audio-Visual, Github, Kranti Kumar Parida] [[Github Link]](https://github.com/krantiparida/awesome-audio-visual)
206 | 
207 | ### Multi-channel Speech Separation
208 | 
209 | - [FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing, Yi Luo , Arxiv 2019] [[Paper]](https://arxiv.org/abs/1909.13387)
210 | 
211 | - [MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition, Xuankai Chang et al., ASRU 2020] [[Paper]](https://arxiv.org/pdf/1910.06522.pdf)
212 | 
213 | - [End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation, Yi Luo et al., ICASSP 2020] [[Paper]](https://arxiv.org/pdf/1910.14104.pdf) [[Code]](https://github.com/yluo42/TAC)
214 | 
215 | - [Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning, Rongzhi Guo, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/2003.03927.pdf)
216 | 
217 | - [Multi-modal Multi-channel Target Speech Separation, Rongzhi Guo, J-STSP 2020] [[Paper]](https://arxiv.org/pdf/2003.07032.pdf)
218 | 
219 | ### Speaker Extraction
220 | 
221 | - [Single channel target speaker extraction and recognition with speaker beam, Marc Delcroix, ICASSP 2018] [[Paper]](http://150.162.46.34:8080/icassp2018/ICASSP18_USB/pdfs/0005554.pdf)
222 | 
223 | - [VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking, Quan Wang, INTERSPEECH 2018] [[Paper]](https://arxiv.org/pdf/1810.04826.pdf) [[Code (Jian Wu)]](https://github.com/funcwj/voice-filter)
224 | 
225 | - [Single-Channel Speech Extraction Using Speaker Inventory and Attention Network, Xiong Xiao et al, ICASSP 2019] [[Paper]](http://150.162.46.34:8080/icassp2019/ICASSP2019/pdfs/0000086.pdf)
226 | 
227 | - [Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss, Chenglin Xu, ICASSP 2019] [[Paper]](https://arxiv.org/pdf/1903.09952.pdf) [[Code]](https://github.com/xuchenglin28/speaker_extraction)
228 | 
229 | - [Time-domain speaker extraction network, Chenglin Xu, ASRU 2019] [[Paper]](https://arxiv.org/pdf/2004.14762.pdf)
230 | 
231 | - [SpEx: Multi-Scale Time Domain Speaker Extraction Network, Chenglin Xu, TASLP 2020] [[Paper]](https://arxiv.org/pdf/2004.08326.pdf)
232 | 
233 | - [Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam, Marc Delcroix, ICASSP 2020] [[Paper]](https://arxiv.org/pdf/2001.08378.pdf)
234 | 
235 | - [SpEx+: A Complete Time Domain Speaker Extraction Network, Meng Ge, Arxiv 2020] [[Paper]](https://arxiv.org/pdf/2005.04686.pdf) [[Code]](https://github.com/gemengtju/SpEx_Plus/tree/master/nnet)
236 | 
237 | 
238 | ## Tools
239 | 
240 | ### System Tools
241 | 
242 | - [Asteroid: the PyTorch-based audio source separation toolkit for researchers, Manuel Pariente et al., ICASSP 2020] [[Tool Link]](https://github.com/mpariente/asteroid)
243 | - [ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration, Chenda Li et al., arxiv] [[Paper Link]](https://arxiv.org/pdf/2011.03706.pdf)
244 | 
245 | ### Evaluation Tools
246 | 
247 | - [Performance measurement in blind audio sourceseparation, Emmanuel Vincent et al., TASLP 2004] [[Paper]](https://hal.inria.fr/inria-00544230/document) [[Tool Link]](https://github.com/gemengtju/Tutorial_Separation/tree/master/evaluation/sdr_pesq_sisdr)
248 | 
249 | - [SDR – Half-baked or Well Done?, Jonathan Le Roux, ICASSP 2019] [[Paper]](https://arxiv.org/pdf/1811.02508) [[Tool Link]](https://github.com/gemengtju/Tutorial_Separation/tree/master/evaluation/sdr_pesq_sisdr)
250 | 
251 | 
252 | ## Results on WSJ0-2mix
253 | 
254 | Speech separation (SS) and speaker extraction (SE) on the WSJ0-2mix (8k, min) dataset.
255 | 
256 | |  Task | Methods  | Model Size  | SDRi  | SI-SDRi  |
257 | | :------------: | :------------: | :------------: | :------------: | :------------: |
258 | | SS  | DPCL++  | 13.6M  | -  | 10.8   | 
259 | | SS  | uPIT-BLSTM-ST  | 92.7M  | 10.0  | -   | 
260 | | SS  | DANet  | 9.1M  | -  | 10.5   | 
261 | | SS  | cuPIT-Grid-RD  | 53.2M  | 10.2  | -   | 
262 | | SS  | SDC-G-MTL  | 53.9M  | 10.5  | -   | 
263 | | SS  | CBLDNN-GAT  | 39.5M  | 11.0  | -   | 
264 | | SS  | Chimera++  | 32.9M  | 12.0  | 11.5   | 
265 | | SS  | WA-MISI-5  | 32.9M  | 13.1  | 12.6   | 
266 | | SS  | BLSTM-TasNet  | 23.6M  | 13.6  | 13.2   | 
267 | | SS  | Conv-TasNet  | 5.1M  | 15.6  | 15.3   | 
268 | | SE  | SpEx  | 10.8M  | 17.0  | 16.6   | 
269 | | SE  | SpEx+  | 11.1M  | 17.6  | 17.4   | 
270 | | SS  | DeepCASA  | 12.8M  | 18.0  | 17.7   | 
271 | | SS  | FurcaNeXt  | 51.4M  | 18.4  | -   | 
272 | | SS  | DPRNN-TasNet  | 2.6M  | 19.0  | 18.8   | 
273 | | SS  | Wavesplit  | -  | 19.2  | 19.0   | 
274 | | SS  | Wavesplit + Dynamic mixing  | -  | 20.6  | 20.4   | 
275 | 


--------------------------------------------------------------------------------
/book/2018_Book_AudioSourceSeparation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/book/2018_Book_AudioSourceSeparation.pdf


--------------------------------------------------------------------------------
/book/Audio_Source_Separation_and_Speech_Enhancement.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/book/Audio_Source_Separation_and_Speech_Enhancement.pdf


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/bss_eval_sources.m:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/bss_eval_sources.m


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/DC_block.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/DC_block.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/FFTNXCorr.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/FFTNXCorr.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/apply_VAD.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_VAD.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filter.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filter.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filters.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/apply_filters.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq2.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/batch_pesq2.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/composite.asv:
--------------------------------------------------------------------------------
  1 | function [Csig,Cbak,Covl]= composite(cleanFile, enhancedFile);
  2 | 
  3 | % --------- composite objective measure ----------------------
  4 | %
  5 | %		   Center for Robust Speech Systems
  6 | %			  University of Texas-Dallas
  7 | %			   Copyright (c) 2006
  8 | %			  All Rights Reserved.
  9 | %
 10 | %  Description:
 11 | %
 12 | %     This function implements the composite objective measure 
 13 | %     proposed in [1].  It returns three values: The predicted rating of
 14 | %     overall quality (Covl), the rating of speech distortion (Csig) and
 15 | %     the rating of background distortion (Cbak). The ratings use the 1-5 MOS scale.
 16 | %     In addition, it returns the values of the SNRseg, log-likelihood ratio (LLR), PESQ
 17 | %     anThe algorithm
 18 | 
 19 | if nargin<2
 20 |     fprintf('Usage: [Csig,Cbak,Covl]=composite(cleanfile.wav,enhanced.wav)\n');
 21 |     fprintf('where ''Csig'' is the predicted rating of speech distortion\n');
 22 |     fprintf('      ''Cbak'' is the predicted rating of background distortion\n');
 23 |     fprintf('      ''Covl'' is the predicted rating of overall quality.\n\n');
 24 |     return;
 25 | end
 26 | 
 27 | 
 28 | alpha= 0.95;
 29 | 
 30 | [data1, Srate1, Nbits1]= wavread(cleanFile);
 31 | [data2, Srate2, Nbits2]= wavread(enhancedFile);
 32 | if ( Srate1~= Srate2) | ( Nbits1~= Nbits2)
 33 |     error( 'The two files do not match!\n');
 34 | end
 35 | 
 36 | len= min( length( data1), length( data2));
 37 | data1= data1( 1: len)+eps;
 38 | data2= data2( 1: len)+eps;
 39 | 
 40 | 
 41 | % -- compute the WSS measure ---
 42 | %
 43 | wss_dist_vec= wss( data1, data2,Srate1);
 44 | wss_dist_vec= sort( wss_dist_vec);
 45 | wss_dist= mean( wss_dist_vec( 1: round( length( wss_dist_vec)*alpha)));
 46 | 
 47 | % --- compute the LLR measure ---------
 48 | %
 49 | LLR_dist= llr( data1, data2,Srate1);
 50 | LLRs= sort(LLR_dist);
 51 | LLR_len= round( length(LLR_dist)* alpha);
 52 | llr_mean= mean( LLRs( 1: LLR_len));
 53 | 
 54 | % --- compute the SNRseg ----------------
 55 | %
 56 | [snr_dist, segsnr_dist]= snr( data1, data2,Srate1);
 57 | snr_mean= snr_dist;
 58 | segSNR= mean( segsnr_dist);
 59 | 
 60 | 
 61 | % -- compute the pesq ----
 62 |  [pesq_mos]= pesq(cleanFile, enhancedFile);
 63 |  
 64 |  
 65 | % --- now compute the composite measures ------------------
 66 | %
 67 | Csig = 3.093 - 1.029*llr_mean + 0.603*pesq_mos-0.009*wss_dist;
 68 | Cbak = 1.634 + 0.478 *pesq_mos - 0.007*wss_dist + 0.063*segSNR;
 69 | Covl = 1.594 + 0.805*pesq_mos - 0.512*llr_mean - 0.007*wss_dist;
 70 | 
 71 | fprintf('\n LLR=%f   SNRseg=%f   WSS=%f   PESQ=%f\n',llr_mean,segSNR,wss_dist,pesq_mos);
 72 | 
 73 | return;
 74 | 
 75 | % ----------------------------------------------------------------------
 76 | %
 77 | %     Weighted Spectral Slope (WSS) Objective Speech Quality Measure
 78 | %
 79 | %		   Center for Robust Speech Systems
 80 | %			  University of Texas-Dallas
 81 | %			   Copyright (c) 1998-2006
 82 | %			  All Rights Reserved.
 83 | %
 84 | %  Description:
 85 | %
 86 | %     This function implements the Weighted Spectral Slope (WSS)
 87 | %     distance measure originally proposed in [1].  The algorithm
 88 | %     works by first decomposing the speech signal into a set of
 89 | %     frequency bands (this is done for both the test and reference
 90 | %     frame).  The intensities within each critical band are 
 91 | %     measured.  Then, a weighted distances between the measured
 92 | %     slopes of the log-critical band spectra are computed.  
 93 | %     This measure is also described in Section 2.2.9 (pages 56-58)
 94 | %     of [2].
 95 | %
 96 | %     Whereas Klatt's original measure used 36 critical-band 
 97 | %     filters to estimate the smoothed short-time spectrum, this
 98 | %     implementation considers a bank of 25 filters spanning 
 99 | %     the 4 kHz bandwidth.  
100 | %
101 | %  Input/Output:
102 | %
103 | %     The input is a reference 8kHz sampled speech, and processed 
104 | %     speech (could be noisy or enhanced).  
105 | %
106 | %     The function returns the numerical distance between each
107 | %     frame of the two input files (one distance per frame).
108 | %
109 | %  References:
110 | %
111 | %     [1] D. H. Klatt, "Prediction of Perceived Phonetic Distance
112 | %	    from Critical-Band Spectra: A First Step", Proc. IEEE
113 | %	    ICASSP'82, Volume 2, pp. 1278-1281, May, 1982.
114 | %
115 | %     [2] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements,
116 | %	    Objective Measures of Speech Quality.  Prentice Hall
117 | %	    Advanced Reference Series, Englewood Cliffs, NJ, 1988,
118 | %	    ISBN: 0-13-629056-6.
119 | %
120 | %  Authors:
121 | %
122 | %     Bryan L. Pellom and John H. L. Hansen
123 | %     
124 | %
125 | %  Last Modified:
126 | %
127 | %     July 22, 1998
128 | %     September 12, 2006 by Philipos Loizou
129 | % ----------------------------------------------------------------------
130 | 
131 | function distortion = wss(clean_speech, processed_speech,sample_rate)
132 | 
133 | 
134 | % ----------------------------------------------------------------------
135 | % Check the length of the clean and processed speech.  Must be the same.
136 | % ----------------------------------------------------------------------
137 | 
138 | clean_length      = length(clean_speech);
139 | processed_length  = length(processed_speech);
140 | 
141 | if (clean_length ~= processed_length)
142 |   disp('Error: Files  musthave same length.');
143 |   return
144 | end
145 | 
146 | 
147 | 
148 | % ----------------------------------------------------------------------
149 | % Global Variables
150 | % ----------------------------------------------------------------------
151 | 
152 | % sample_rate = 8000;		   % default sample rate
153 | % winlength   = 240;		   % window length in samples
154 | % skiprate    = 60;		   % window skip in samples
155 | winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
156 | skiprate    = floor(winlength/4);		   % window skip in samples
157 | max_freq    = sample_rate/2;	   % maximum bandwidth
158 | num_crit    = 25;		   % number of critical bands
159 | 
160 | USE_FFT_SPECTRUM = 1;		   % defaults to 10th order LP spectrum
161 | %n_fft       = 512;		   % FFT size
162 | n_fft       = 2^nextpow2(2*winlength);
163 | n_fftby2    = n_fft/2;		   % FFT size/2
164 | Kmax        = 20;		   % value suggested by Klatt, pg 1280
165 | Klocmax     = 1;		   % value suggested by Klatt, pg 1280		
166 | 
167 | % ----------------------------------------------------------------------
168 | % Critical Band Filter Definitions (Center Frequency and Bandwidths in Hz)
169 | % ----------------------------------------------------------------------
170 | 
171 | cent_freq(1)  = 50.0000;   bandwidth(1)  = 70.0000;
172 | cent_freq(2)  = 120.000;   bandwidth(2)  = 70.0000;
173 | cent_freq(3)  = 190.000;   bandwidth(3)  = 70.0000;
174 | cent_freq(4)  = 260.000;   bandwidth(4)  = 70.0000;
175 | cent_freq(5)  = 330.000;   bandwidth(5)  = 70.0000;
176 | cent_freq(6)  = 400.000;   bandwidth(6)  = 70.0000;
177 | cent_freq(7)  = 470.000;   bandwidth(7)  = 70.0000;
178 | cent_freq(8)  = 540.000;   bandwidth(8)  = 77.3724;
179 | cent_freq(9)  = 617.372;   bandwidth(9)  = 86.0056;
180 | cent_freq(10) = 703.378;   bandwidth(10) = 95.3398;
181 | cent_freq(11) = 798.717;   bandwidth(11) = 105.411;
182 | cent_freq(12) = 904.128;   bandwidth(12) = 116.256;
183 | cent_freq(13) = 1020.38;   bandwidth(13) = 127.914;
184 | cent_freq(14) = 1148.30;   bandwidth(14) = 140.423;
185 | cent_freq(15) = 1288.72;   bandwidth(15) = 153.823;
186 | cent_freq(16) = 1442.54;   bandwidth(16) = 168.154;
187 | cent_freq(17) = 1610.70;   bandwidth(17) = 183.457;
188 | cent_freq(18) = 1794.16;   bandwidth(18) = 199.776;
189 | cent_freq(19) = 1993.93;   bandwidth(19) = 217.153;
190 | cent_freq(20) = 2211.08;   bandwidth(20) = 235.631;
191 | cent_freq(21) = 2446.71;   bandwidth(21) = 255.255;
192 | cent_freq(22) = 2701.97;   bandwidth(22) = 276.072;
193 | cent_freq(23) = 2978.04;   bandwidth(23) = 298.126;
194 | cent_freq(24) = 3276.17;   bandwidth(24) = 321.465;
195 | cent_freq(25) = 3597.63;   bandwidth(25) = 346.136;
196 | 
197 | bw_min      = bandwidth (1);	   % minimum critical bandwidth
198 | 
199 | % ----------------------------------------------------------------------
200 | % Set up the critical band filters.  Note here that Gaussianly shaped
201 | % filters are used.  Also, the sum of the filter weights are equivalent
202 | % for each critical band filter.  Filter less than -30 dB and set to
203 | % zero.
204 | % ----------------------------------------------------------------------
205 | 
206 | min_factor = exp (-30.0 / (2.0 * 2.303));       % -30 dB point of filter
207 | 
208 | for i = 1:num_crit
209 |   f0 = (cent_freq (i) / max_freq) * (n_fftby2);
210 |   all_f0(i) = floor(f0);
211 |   bw = (bandwidth (i) / max_freq) * (n_fftby2);
212 |   norm_factor = log(bw_min) - log(bandwidth(i));
213 |   j = 0:1:n_fftby2-1;
214 |   crit_filter(i,:) = exp (-11 *(((j - floor(f0)) ./bw).^2) + norm_factor);
215 |   crit_filter(i,:) = crit_filter(i,:).*(crit_filter(i,:) > min_factor);
216 | end   
217 | 
218 | % ----------------------------------------------------------------------
219 | % For each frame of input speech, calculate the Weighted Spectral
220 | % Slope Measure
221 | % ----------------------------------------------------------------------
222 | 
223 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
224 | start      = 1;					% starting sample
225 | window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));
226 | 
227 | for frame_count = 1:num_frames
228 | 
229 |    % ----------------------------------------------------------
230 |    % (1) Get the Frames for the test and reference speech. 
231 |    %     Multiply by Hanning Window.
232 |    % ----------------------------------------------------------
233 | 
234 |    clean_frame = clean_speech(start:start+winlength-1);
235 |    processed_frame = processed_speech(start:start+winlength-1);
236 |    clean_frame = clean_frame.*window;
237 |    processed_frame = processed_frame.*window;
238 | 
239 |    % ----------------------------------------------------------
240 |    % (2) Compute the Power Spectrum of Clean and Processed
241 |    % ----------------------------------------------------------
242 | 
243 |     if (USE_FFT_SPECTRUM)
244 |        clean_spec     = (abs(fft(clean_frame,n_fft)).^2);
245 |        processed_spec = (abs(fft(processed_frame,n_fft)).^2);
246 |     else
247 |        a_vec = zeros(1,n_fft);
248 |        a_vec(1:11) = lpc(clean_frame,10);
249 |        clean_spec     = 1.0/(abs(fft(a_vec,n_fft)).^2)';
250 | 
251 |        a_vec = zeros(1,n_fft);
252 |        a_vec(1:11) = lpc(processed_frame,10);
253 |        processed_spec = 1.0/(abs(fft(a_vec,n_fft)).^2)';
254 |     end
255 | 
256 |    % ----------------------------------------------------------
257 |    % (3) Compute Filterbank Output Energies (in dB scale)
258 |    % ----------------------------------------------------------
259 |  
260 |    for i = 1:num_crit
261 |       clean_energy(i) = sum(clean_spec(1:n_fftby2) ...
262 | 		            .*crit_filter(i,:)');
263 |       processed_energy(i) = sum(processed_spec(1:n_fftby2) ...
264 | 			        .*crit_filter(i,:)');
265 |    end
266 |    clean_energy = 10*log10(max(clean_energy,1E-10));
267 |    processed_energy = 10*log10(max(processed_energy,1E-10));
268 | 
269 |    % ----------------------------------------------------------
270 |    % (4) Compute Spectral Slope (dB[i+1]-dB[i]) 
271 |    % ----------------------------------------------------------
272 | 
273 |    clean_slope     = clean_energy(2:num_crit) - ...
274 | 		     clean_energy(1:num_crit-1);
275 |    processed_slope = processed_energy(2:num_crit) - ...
276 | 		     processed_energy(1:num_crit-1);
277 | 
278 |    % ----------------------------------------------------------
279 |    % (5) Find the nearest peak locations in the spectra to 
280 |    %     each critical band.  If the slope is negative, we 
281 |    %     search to the left.  If positive, we search to the 
282 |    %     right.
283 |    % ----------------------------------------------------------
284 | 
285 |    for i = 1:num_crit-1
286 | 
287 |        % find the peaks in the clean speech signal
288 | 	
289 |        if (clean_slope(i)>0) 		% search to the right
290 | 	  n = i;
291 |           while ((n<num_crit) & (clean_slope(n) > 0))
292 | 	     n = n+1;
293 |  	  end
294 | 	  clean_loc_peak(i) = clean_energy(n-1);
295 |        else				% search to the left
296 |           n = i;
297 | 	  while ((n>0) & (clean_slope(n) <= 0))
298 | 	     n = n-1;
299 |  	  end
300 | 	  clean_loc_peak(i) = clean_energy(n+1);
301 |        end
302 | 
303 |        % find the peaks in the processed speech signal
304 | 
305 |        if (processed_slope(i)>0) 	% search to the right
306 | 	  n = i;
307 |           while ((n<num_crit) & (processed_slope(n) > 0))
308 | 	     n = n+1;
309 | 	  end
310 | 	  processed_loc_peak(i) = processed_energy(n-1);
311 |        else				% search to the left
312 |           n = i;
313 | 	  while ((n>0) & (processed_slope(n) <= 0))
314 | 	     n = n-1;
315 |  	  end
316 | 	  processed_loc_peak(i) = processed_energy(n+1);
317 |        end
318 | 
319 |    end
320 | 
321 |    % ----------------------------------------------------------
322 |    %  (6) Compute the WSS Measure for this frame.  This 
323 |    %      includes determination of the weighting function.
324 |    % ----------------------------------------------------------
325 | 
326 |    dBMax_clean       = max(clean_energy);
327 |    dBMax_processed   = max(processed_energy);
328 | 
329 |    % The weights are calculated by averaging individual
330 |    % weighting factors from the clean and processed frame.
331 |    % These weights W_clean and W_processed should range
332 |    % from 0 to 1 and place more emphasis on spectral 
333 |    % peaks and less emphasis on slope differences in spectral
334 |    % valleys.  This procedure is described on page 1280 of
335 |    % Klatt's 1982 ICASSP paper.
336 | 
337 |    Wmax_clean        = Kmax ./ (Kmax + dBMax_clean - ...
338 | 		 	    clean_energy(1:num_crit-1));
339 |    Wlocmax_clean     = Klocmax ./ ( Klocmax + clean_loc_peak - ...
340 | 				clean_energy(1:num_crit-1));
341 |    W_clean           = Wmax_clean .* Wlocmax_clean;
342 | 
343 |    Wmax_processed    = Kmax ./ (Kmax + dBMax_processed - ...
344 | 			        processed_energy(1:num_crit-1));
345 |    Wlocmax_processed = Klocmax ./ ( Klocmax + processed_loc_peak - ...
346 | 			            processed_energy(1:num_crit-1));
347 |    W_processed       = Wmax_processed .* Wlocmax_processed;
348 |   
349 |    W = (W_clean + W_processed)./2.0;
350 |   
351 |    distortion(frame_count) = sum(W.*(clean_slope(1:num_crit-1) - ...
352 | 		       processed_slope(1:num_crit-1)).^2);
353 | 
354 |    % this normalization is not part of Klatt's paper, but helps
355 |    % to normalize the measure.  Here we scale the measure by the
356 |    % sum of the weights.
357 | 
358 |    distortion(frame_count) = distortion(frame_count)/sum(W);
359 |    
360 |    start = start + skiprate;
361 |      
362 | end
363 | 
364 | %-----------------------------------------------
365 | function distortion = llr(clean_speech, processed_speech,sample_rate)
366 | 
367 | 
368 | % ----------------------------------------------------------------------
369 | % Check the length of the clean and processed speech.  Must be the same.
370 | % ----------------------------------------------------------------------
371 | 
372 | clean_length      = length(clean_speech);
373 | processed_length  = length(processed_speech);
374 | 
375 | if (clean_length ~= processed_length)
376 |   disp('Error: Both Speech Files must be same length.');
377 |   return
378 | end
379 | 
380 | % ----------------------------------------------------------------------
381 | % Global Variables
382 | % ----------------------------------------------------------------------
383 | 
384 | % sample_rate = 8000;		   % default sample rate
385 | % winlength   = 240;		   % window length in samples
386 | % skiprate    = 60;		   % window skip in samples
387 | % P           = 10;		   % LPC Analysis Order
388 | winlength   = round(30*sample_rate/1000); %  window length in samples
389 | skiprate    = floor(winlength/4);		   % window skip in samples
390 | if sample_rate<10000
391 |    P           = 10;		   % LPC Analysis Order
392 | else
393 |     P=16;     % this could vary depending on sampling frequency.
394 | end
395 | 
396 | % ----------------------------------------------------------------------
397 | % For each frame of input speech, calculate the Log Likelihood Ratio 
398 | % ----------------------------------------------------------------------
399 | 
400 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
401 | start      = 1;					% starting sample
402 | window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));
403 | 
404 | for frame_count = 1:num_frames
405 | 
406 |    % ----------------------------------------------------------
407 |    % (1) Get the Frames for the test and reference speech. 
408 |    %     Multiply by Hanning Window.
409 |    % ----------------------------------------------------------
410 | 
411 |    clean_frame = clean_speech(start:start+winlength-1);
412 |    processed_frame = processed_speech(start:start+winlength-1);
413 |    clean_frame = clean_frame.*window;
414 |    processed_frame = processed_frame.*window;
415 | 
416 |    % ----------------------------------------------------------
417 |    % (2) Get the autocorrelation lags and LPC parameters used
418 |    %     to compute the LLR measure.
419 |    % ----------------------------------------------------------
420 | 
421 |    [R_clean, Ref_clean, A_clean] = ...
422 |       lpcoeff(clean_frame, P);
423 |    [R_processed, Ref_processed, A_processed] = ...
424 |       lpcoeff(processed_frame, P);
425 | 
426 |    % ----------------------------------------------------------
427 |    % (3) Compute the LLR measure
428 |    % ----------------------------------------------------------
429 | 
430 |    numerator   = A_processed*toeplitz(R_clean)*A_processed';
431 |    denominator = A_clean*toeplitz(R_clean)*A_clean';
432 |    distortion(frame_count) = log(numerator/denominator); 
433 |    start = start + skiprate;
434 | 
435 | end
436 | 
437 | %---------------------------------------------
438 | function [acorr, refcoeff, lpparams] = lpcoeff(speech_frame, model_order)
439 | 
440 |    % ----------------------------------------------------------
441 |    % (1) Compute Autocorrelation Lags
442 |    % ----------------------------------------------------------
443 | 
444 |    winlength = max(size(speech_frame));
445 |    for k=1:model_order+1
446 |       R(k) = sum(speech_frame(1:winlength-k+1) ...
447 | 		     .*speech_frame(k:winlength));
448 |    end
449 | 
450 |    % ----------------------------------------------------------
451 |    % (2) Levinson-Durbin
452 |    % ----------------------------------------------------------
453 | 
454 |    a = ones(1,model_order);
455 |    E(1)=R(1);
456 |    for i=1:model_order
457 |       a_past(1:i-1) = a(1:i-1);
458 |       sum_term = sum(a_past(1:i-1).*R(i:-1:2));
459 |       rcoeff(i)=(R(i+1) - sum_term) / E(i);
460 |       a(i)=rcoeff(i);
461 |       a(1:i-1) = a_past(1:i-1) - rcoeff(i).*a_past(i-1:-1:1);
462 |       E(i+1)=(1-rcoeff(i)*rcoeff(i))*E(i);
463 |    end
464 | 
465 |    acorr    = R;
466 |    refcoeff = rcoeff;
467 |    lpparams = [1 -a];
468 | 
469 |    
470 |    % ----------------------------------------------------------------------
471 | 
472 | function [overall_snr, segmental_snr] = snr(clean_speech, processed_speech,sample_rate)
473 | 
474 | % ----------------------------------------------------------------------
475 | % Check the length of the clean and processed speech.  Must be the same.
476 | % ----------------------------------------------------------------------
477 | 
478 | clean_length      = length(clean_speech);
479 | processed_length  = length(processed_speech);
480 | 
481 | if (clean_length ~= processed_length)
482 |   disp('Error: Both Speech Files must be same length.');
483 |   return
484 | end
485 | 
486 | % ----------------------------------------------------------------------
487 | % Scale both clean speech and processed speech to have same dynamic
488 | % range.  Also remove DC component from each signal
489 | % ----------------------------------------------------------------------
490 | 
491 | %clean_speech     = clean_speech     - mean(clean_speech);
492 | %processed_speech = processed_speech - mean(processed_speech);
493 | 
494 | %processed_speech = processed_speech.*(max(abs(clean_speech))/ max(abs(processed_speech)));
495 | 
496 | overall_snr = 10* log10( sum(clean_speech.^2)/sum((clean_speech-processed_speech).^2));
497 | 
498 | % ----------------------------------------------------------------------
499 | % Global Variables
500 | % ----------------------------------------------------------------------
501 | 
502 | % sample_rate = 8000;		   % default sample rate
503 | % winlength   = 240;		   % window length in samples
504 | % skiprate    = 60;		   % window skip in samples
505 | winlength   = round(30*sample_rate/1000); %240;		   % window length in samples
506 | skiprate    = floor(winlength/4);		   % window skip in samples
507 | MIN_SNR     = -10;		   % minimum SNR in dB
508 | MAX_SNR     =  35;		   % maximum SNR in dB
509 | 
510 | % ----------------------------------------------------------------------
511 | % For each frame of input speech, calculate the Segmental SNR
512 | % ----------------------------------------------------------------------
513 | 
514 | num_frames = clean_length/skiprate-(winlength/skiprate); % number of frames
515 | start      = 1;					% starting sample
516 | window     = 0.5*(1 - cos(2*pi*(1:winlength)'/(winlength+1)));
517 | 
518 | for frame_count = 1: num_frames
519 | 
520 |    % ----------------------------------------------------------
521 |    % (1) Get the Frames for the test and reference speech. 
522 |    %     Multiply by Hanning Window.
523 |    % ----------------------------------------------------------
524 | 
525 |    clean_frame = clean_speech(start:start+winlength-1);
526 |    processed_frame = processed_speech(start:start+winlength-1);
527 |    clean_frame = clean_frame.*window;
528 |    processed_frame = processed_frame.*window;
529 | 
530 |    % ----------------------------------------------------------
531 |    % (2) Compute the Segmental SNR
532 |    % ----------------------------------------------------------
533 | 
534 |    signal_energy = sum(clean_frame.^2);
535 |    noise_energy  = sum((clean_frame-processed_frame).^2);
536 |    segmental_snr(frame_count) = 10*log10(signal_energy/(noise_energy+eps)+eps);
537 |    segmental_snr(frame_count) = max(segmental_snr(frame_count),MIN_SNR);
538 |    segmental_snr(frame_count) = min(segmental_snr(frame_count),MAX_SNR);
539 | 
540 |    start = start + skiprate;
541 | 
542 | end
543 | 
544 | 
545 | 
546 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/composite.m:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/composite.m


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/convolution_in_timealign.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/convolution_in_timealign.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/crude_align.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/crude_align.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/enhanced_logmmse.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/enhanced_logmmse.wav


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/fix_power_level.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/fix_power_level.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/id_searchwindows.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/id_searchwindows.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/id_utterances.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/id_utterances.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/input_filter.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/input_filter.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pesq.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_debug.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_debug.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_measure.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_measure.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_psychoacoustic_model.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_psychoacoustic_model.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_testbench.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pesq_testbench.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/plot_wav.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/plot_wav.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/pow_of.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/pow_of.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/readme.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/readme.pdf


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/readme.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/readme.txt


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/setup_global.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/setup_global.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/sp09.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/sp09.wav


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/sp09_babble_sn10.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/sp09_babble_sn10.wav


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/split_align.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/split_align.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/time_align.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/time_align.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_locate.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_locate.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_split.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/composite_pesq/utterance_split.p


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/des_file_name.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/evaluation/sdr_pesq_sisdr/des_file_name.pdf


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/eval_sdr.m:
--------------------------------------------------------------------------------
  1 | function eval_sdr(dataset, eval_mix, eval_pesq, model_name, fmix, fclean)
  2 | addpath('composite_pesq');
  3 | %% WSJ0_2mix_extr
  4 | %mixed_wav_dir = ['/export/home/clx214/data/wsj0_2mix_extr/wav8k/max/' dataset '/' fmix '/'];
  5 | %spk1_dir = ['/export/home/clx214/data/wsj0_2mix_extr/wav8k/max/' dataset '/' fclean '/'];
  6 | 
  7 | %% WSJ0_2mix
  8 | mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/data/wsj0_2mix_min_mix_6k/'];
  9 | spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/data/wsj0_2mix_min_clean_6k/'];
 10 | 
 11 | %% WHAM
 12 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAM/data/WHAM_mix_6k/'];
 13 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAM/data/WHAM_clean_6k/'];
 14 | 
 15 | % WHAMR reverb
 16 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_mix_6k/'];
 17 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/'];
 18 | 
 19 | % WHAMR noise
 20 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR2/data/WHAMR_noise_mix_6k/'];
 21 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/']; % all WHAMR clean data is same
 22 | 
 23 | % WHAMR noise + reverb
 24 | %mixed_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR_noise/data/WHAMR_noise_reverb_mix_6k/'];
 25 | %spk1_dir = ['/export/home/clx214/gm/ntu_project/SpEx+_WHAMR/data/WHAMR_clean_6k/']; % all WHAMR clean data is same
 26 | 
 27 | 
 28 | sprintf('start, %s\n', model_name)
 29 | %rec_wav_dir = ['../data/rec/' dataset '/' model_name  '/'];
 30 | rec_wav_dir = ['/export/home/clx214/gm/ntu_project/SpEx_SincNetAuxCNNEncoder_MultiOriEncoder_share_min_2spk/rec_aux60/spk1/'];
 31 | lists = dir(rec_wav_dir);
 32 | len = length(lists) - 2;
 33 | SDR = zeros(len, 1);
 34 | SIR = zeros(len, 1);
 35 | SAR = zeros(len, 1);
 36 | SDR_Mix = zeros(len, 1);
 37 | SIR_Mix = zeros(len, 1);
 38 | SAR_Mix = zeros(len, 1);
 39 | PESQ = zeros(len, 1);
 40 | PESQ_Mix = zeros(len, 1);
 41 | SISDR = zeros(len, 1);
 42 | SISDR_Mix = zeros(len, 1);
 43 | 
 44 | target_durs=textscan(fopen('target_ref_dur.txt'), '%s %f');
 45 | 
 46 | for i = 3:len+2
 47 |     name = lists(i).name;
 48 |     part_name = name(1:end-4);
 49 |     [rec_wav, Fs] = audioread([rec_wav_dir part_name '.wav']);
 50 |     ori_wav = audioread([spk1_dir part_name '.wav']);
 51 |     mix_wav = audioread([mixed_wav_dir part_name '.wav']);
 52 | 
 53 |     % get ground truth length
 54 |     utt_tokens = strsplit(part_name, '_');
 55 |     idx = find(strcmp(target_durs{1}, utt_tokens{1}));
 56 |     dur = int32(target_durs{2}(idx)*Fs);
 57 |     
 58 |     min_len = min(size(ori_wav, 1), size(rec_wav, 1));
 59 |     min_len = int32(min(min_len, dur));
 60 |     
 61 |     rec_wav = rec_wav(1:min_len);
 62 |     ori_wav = ori_wav(1:min_len);
 63 |     mix_wav = mix_wav(1:min_len);
 64 | 
 65 |     [SDR(i-2),SIR(i-2),SAR(i-2),perm]=bss_eval_sources(rec_wav',ori_wav');
 66 |     SISDR(i-2)=cal_SISDR(ori_wav', rec_wav');
 67 | 
 68 |     if eval_pesq
 69 |         fprintf('PESQINDEX: %d\n', i);
 70 |         fprintf('PESQINDEX: %s,%s\n', [spk1_dir part_name '.wav'], [rec_wav_dir part_name '.wav']);
 71 |         PESQ(i-2)=pesq(8000, [spk1_dir part_name '.wav'], [rec_wav_dir part_name '.wav']);
 72 |     end
 73 | 
 74 |     if eval_mix
 75 |         [SDR_Mix(i-2),SIR_Mix(i-2),SAR_Mix(i-2),perm]=bss_eval_sources(mix_wav',ori_wav');
 76 |         SISDR_Mix(i-2)=cal_SISDR(ori_wav', mix_wav');
 77 | 
 78 |         if eval_pesq
 79 |             fprintf('PESQINDEX_MIX: %d\n', i);
 80 |             PESQ_Mix(i-2)=pesq(8000, [spk1_dir part_name '.wav'], [mixed_wav_dir part_name '.wav']);
 81 |         end
 82 |     end
 83 | 
 84 |     if mod(i, 200) == 0
 85 |         fprintf('the number of sample is evaluated: %d\n', i);
 86 |         fprintf('%s, %s, target:%d, org:%d, rec:%d\n', part_name, utt_tokens{1}, dur, size(ori_wav,1), size(rec_wav,1));
 87 |     end
 88 | end
 89 | mean_SDR = mean(SDR);
 90 | mean_SIR = mean(SIR);
 91 | mean_SAR = mean(SAR);
 92 | mean_PESQ = mean(PESQ);
 93 | mean_SISDR = mean(SISDR);
 94 | fprintf('The mean SDR, SIR, SAR, PESQ, SISDR are: %f ,\t %f ,\t %f ,\t %f, \t %f \n', mean_SDR, mean_SIR, mean_SAR, mean_PESQ, mean_SISDR);
 95 | if eval_mix
 96 |     mean_SDR_Mix = mean(SDR_Mix);
 97 |     mean_SIR_Mix = mean(SIR_Mix);
 98 |     mean_SAR_Mix = mean(SAR_Mix);
 99 |     mean_PESQ_Mix = mean(PESQ_Mix);
100 |     mean_SISDR_Mix = mean(SISDR_Mix);
101 |     fprintf('The mean SDR, SIR, SAR, PESQ, SISDR of mixture are: %f ,\t %f ,\t %f ,\t %f, \t %f \n', mean_SDR_Mix, mean_SIR_Mix, mean_SAR_Mix, mean_PESQ_Mix, mean_SISDR_Mix);
102 | end
103 | 
104 | % Calculte different gender case
105 | if dataset == 'cv'
106 |     [spk, gender] = textread('spk2gender_cv', '%s%d');
107 | else
108 |     [spk, gender] = textread('spk2gender', '%s%d');
109 | end
110 | cmm = 1;
111 | cmf = 1;
112 | cff = 1;
113 | csame = 1;
114 | for i = 1:size(SDR, 1)
115 |     mix_name = lists(i+2).name;
116 |     spk1 = mix_name(1:3);
117 |     tmp = regexp(mix_name, '_');
118 |     spk2 = mix_name(tmp(2)+1:tmp(2)+3);
119 |     for j = 1:length(spk)
120 |         if spk1 == spk{j}
121 |             break
122 |         end
123 |     end
124 |     for k = 1:length(spk)
125 |         if spk2 == spk{k}
126 |             break
127 |         end
128 |     end
129 |     
130 |     if gender(k) == 0 & gender(j) == 0
131 |         SDR_FF(cff) = SDR(i); 
132 |         SIR_FF(cff) = SIR(i);
133 |         SAR_FF(cff) = SAR(i);
134 |         PESQ_FF(cff) = PESQ(i); 
135 | 
136 |         SDR_Same(csame) = SDR(i); 
137 |         SIR_Same(csame) = SIR(i); 
138 |         SAR_Same(csame) = SAR(i); 
139 |         PESQ_Same(csame) = PESQ(i); 
140 |     
141 |         if eval_mix
142 |             SDR_FF_Mix(cff) = SDR_Mix(i); 
143 |             SIR_FF_Mix(cff) = SIR_Mix(i); 
144 |             SAR_FF_Mix(cff) = SAR_Mix(i); 
145 |             PESQ_FF_Mix(cff) = PESQ_Mix(i); 
146 | 
147 |             SDR_Same_Mix(csame) = SDR_Mix(i); 
148 |             SIR_Same_Mix(csame) = SIR_Mix(i); 
149 |             SAR_Same_Mix(csame) = SAR_Mix(i); 
150 |             PESQ_Same_Mix(csame) = PESQ_Mix(i);
151 |         end
152 | 
153 |         lists_FF{cff} = lists(i).name;
154 |         cff = cff +1;
155 |         csame = csame +1;
156 |     
157 |     elseif gender(k) == 1 & gender(j) == 1
158 |         SDR_MM(cmm)= SDR(i); 
159 |         SIR_MM(cmm)= SIR(i);
160 |         SAR_MM(cmm)= SAR(i);
161 |         PESQ_MM(cmm) = PESQ(i); 
162 | 
163 |         SDR_Same(csame) = SDR(i); 
164 |         SIR_Same(csame) = SIR(i); 
165 |         SAR_Same(csame) = SAR(i); 
166 |         PESQ_Same(csame) = PESQ(i); 
167 | 
168 |         if eval_mix
169 |             SDR_MM_Mix(cmm) = SDR_Mix(i); 
170 |             SIR_MM_Mix(cmm) = SIR_Mix(i); 
171 |             SAR_MM_Mix(cmm) = SAR_Mix(i); 
172 |             PESQ_MM_Mix(cmm) = PESQ_Mix(i); 
173 | 
174 |             SDR_Same_Mix(csame) = SDR_Mix(i); 
175 |             SIR_Same_Mix(csame) = SIR_Mix(i); 
176 |             SAR_Same_Mix(csame) = SAR_Mix(i); 
177 |             PESQ_Same_Mix(csame) = PESQ_Mix(i);
178 |         end
179 | 
180 |         lists_MM{cmm} = lists(i).name;
181 |         cmm = cmm + 1;
182 |         csame = csame +1;
183 |     else
184 |         SDR_MF(cmf) = SDR(i);
185 |         SIR_MF(cmf) = SIR(i);
186 |         SAR_MF(cmf) = SAR(i);
187 |         PESQ_MF(cmf) = PESQ(i); 
188 | 
189 |         if eval_mix
190 |             SDR_MF_Mix(cmf) = SDR_Mix(i); 
191 |             SIR_MF_Mix(cmf) = SIR_Mix(i); 
192 |             SAR_MF_Mix(cmf) = SAR_Mix(i); 
193 |             PESQ_MF_Mix(cmf) = PESQ_Mix(i); 
194 |         end
195 | 
196 |         lists_MF{cmf} = lists(i).name;
197 |         cmf = cmf + 1;
198 |     end
199 | end
200 | mean_SDR_MF = mean(SDR_MF);
201 | mean_SDR_FF = mean(SDR_FF);
202 | mean_SDR_MM = mean(SDR_MM);
203 | mean_SDR_Same = mean(SDR_Same);
204 | 
205 | mean_SIR_MF = mean(SIR_MF);
206 | mean_SIR_FF = mean(SIR_FF);
207 | mean_SIR_MM = mean(SIR_MM);
208 | mean_SIR_Same = mean(SIR_Same);
209 | 
210 | mean_SAR_MF = mean(SAR_MF);
211 | mean_SAR_FF = mean(SAR_FF);
212 | mean_SAR_MM = mean(SAR_MM);
213 | mean_SAR_Same = mean(SAR_Same);
214 | 
215 | mean_PESQ_MF = mean(PESQ_MF);
216 | mean_PESQ_FF = mean(PESQ_FF);
217 | mean_PESQ_MM = mean(PESQ_MM);
218 | mean_PESQ_Same = mean(PESQ_Same);
219 | 
220 | if eval_mix
221 |     mean_SDR_MF_Mix = mean(SDR_MF_Mix);
222 |     mean_SDR_FF_Mix = mean(SDR_FF_Mix);
223 |     mean_SDR_MM_Mix = mean(SDR_MM_Mix);
224 |     mean_SDR_Same_Mix = mean(SDR_Same_Mix);
225 | 
226 |     mean_SIR_MF_Mix = mean(SIR_MF_Mix);
227 |     mean_SIR_FF_Mix = mean(SIR_FF_Mix);
228 |     mean_SIR_MM_Mix = mean(SIR_MM_Mix);
229 |     mean_SIR_Same_Mix = mean(SIR_Same_Mix);
230 | 
231 |     mean_SAR_MF_Mix = mean(SAR_MF_Mix);
232 |     mean_SAR_FF_Mix = mean(SAR_FF_Mix);
233 |     mean_SAR_MM_Mix = mean(SAR_MM_Mix);
234 |     mean_SAR_Same_Mix = mean(SAR_Same_Mix);
235 | 
236 |     mean_PESQ_MF_Mix = mean(PESQ_MF_Mix);
237 |     mean_PESQ_FF_Mix = mean(PESQ_FF_Mix);
238 |     mean_PESQ_MM_Mix = mean(PESQ_MM_Mix);
239 |     mean_PESQ_Same_Mix = mean(PESQ_Same_Mix);
240 | end
241 | 
242 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Female are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MF, mean_SIR_MF, mean_SAR_MF, mean_PESQ_MF);
243 | fprintf('The mean SDR, SIR, SAR, PEESQ for Female & Female are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_FF, mean_SIR_FF, mean_SAR_FF, mean_PESQ_FF);
244 | fprintf('The mean SDR, SIR, SAR, PESQ for Male & Male are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MM, mean_SIR_MM, mean_SAR_MM, mean_PESQ_MM);
245 | fprintf('The mean SDR, SIR, SAR, PESQ for same gender are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_Same, mean_SIR_Same, mean_SAR_Same, mean_PESQ_Same);
246 | 
247 | if eval_mix
248 |     fprintf('The mean SDR, SIR, SAR, PESQ for Male & Female mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MF_Mix, mean_SIR_MF_Mix, mean_SAR_MF_Mix, mean_PESQ_MF_Mix);
249 |     fprintf('The mean SDR, SIR, SAR, PEESQ for Female & Female mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_FF_Mix, mean_SIR_FF_Mix, mean_SAR_FF_Mix, mean_PESQ_FF_Mix);
250 |     fprintf('The mean SDR, SIR, SAR, PESQ for Male & Male mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_MM_Mix, mean_SIR_MM_Mix, mean_SAR_MM_Mix, mean_PESQ_MM_Mix);
251 |     fprintf('The mean SDR, SIR, SAR, PESQ for same gender mixture are : %f ,\t %f ,\t %f ,\t %f \n', mean_SDR_Same_Mix, mean_SIR_Same_Mix, mean_SAR_Same_Mix, mean_PESQ_Same_Mix);
252 | end
253 | 
254 | if eval_mix
255 |     save(['sdr_' model_name '_' dataset '.mat'], 'SDR', 'SIR', 'SAR', 'PESQ', 'SDR_Mix', 'SIR_Mix', 'SAR_Mix', 'PESQ_Mix', 'SISDR', 'SISDR_Mix', 'lists', 'mean_SISDR', 'mean_SISDR_Mix', 'mean_SDR', 'mean_SDR_MF', 'mean_SDR_FF', 'mean_SDR_MM', 'mean_SDR_Same','mean_SIR', 'mean_SIR_MF', 'mean_SIR_FF', 'mean_SIR_MM', 'mean_SIR_Same','mean_SAR', 'mean_SAR_MF', 'mean_SAR_FF', 'mean_SAR_MM', 'mean_SAR_Same', 'mean_PESQ', 'mean_PESQ_MF', 'mean_PESQ_FF', 'mean_PESQ_MM', 'mean_PESQ_Same', 'mean_SDR_Mix', 'mean_SDR_MF_Mix', 'mean_SDR_FF_Mix', 'mean_SDR_MM_Mix', 'mean_SDR_Same_Mix', 'mean_SIR_Mix', 'mean_SIR_MF_Mix', 'mean_SIR_FF_Mix', 'mean_SIR_MM_Mix', 'mean_SIR_Same_Mix', 'mean_SAR_Mix', 'mean_SAR_MF_Mix', 'mean_SAR_FF_Mix', 'mean_SAR_MM_Mix', 'mean_SAR_Same_Mix', 'mean_PESQ_Mix', 'mean_PESQ_MF_Mix', 'mean_PESQ_FF_Mix', 'mean_PESQ_MM_Mix', 'mean_PESQ_Same_Mix');
256 | else
257 |     save(['sdr_' model_name '_' dataset '.mat'], 'SDR', 'SIR', 'SAR', 'PESQ', 'SISDR', 'SDR_MF', 'SDR_FF', 'SDR_MM', 'SDR_Same', 'lists', 'mean_SISDR', 'mean_SDR', 'mean_SDR_MF', 'mean_SDR_FF', 'mean_SDR_MM', 'mean_SDR_Same','mean_SIR', 'mean_SIR_MF', 'mean_SIR_FF', 'mean_SIR_MM', 'mean_SIR_Same','mean_SAR', 'mean_SAR_MF', 'mean_SAR_FF', 'mean_SAR_MM', 'mean_SAR_Same', 'mean_PESQ', 'mean_PESQ_MF', 'mean_PESQ_FF', 'mean_PESQ_MM', 'mean_PESQ_Same');
258 | end
259 | 
260 | end
261 | 
262 | function SISDR = cal_SISDR(clean_sig, rec_sig)
263 | clean_sig = clean_sig-mean(clean_sig);
264 | rec_sig = rec_sig-mean(rec_sig);
265 | s_target = dot(rec_sig, clean_sig)*clean_sig/dot(clean_sig, clean_sig);
266 | e_noise = rec_sig - s_target;
267 | SISDR = 10*log10(dot(s_target, s_target)/dot(e_noise, e_noise));
268 | 
269 | end
270 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/mat_debug.txt:
--------------------------------------------------------------------------------
   1 | 0.863095
   2 | 0.462857
   3 | 0.550268
   4 | 0.463802
   5 | 0.245474
   6 | 0.325058
   7 | 0.150809
   8 | 0.614027
   9 | 0.000000
  10 | 0.000000
  11 | 0.000000
  12 | 0.000000
  13 | 0.000000
  14 | 0.000000
  15 | 0.000000
  16 | 0.000000
  17 | 0.000000
  18 | 0.000000
  19 | 0.000000
  20 | 0.000000
  21 | 0.000000
  22 | 0.000000
  23 | 0.000000
  24 | 0.000000
  25 | 0.000000
  26 | 0.000000
  27 | 0.000000
  28 | 0.000000
  29 | 0.605941
  30 | 0.925565
  31 | 0.948630
  32 | 1.682363
  33 | 2.333140
  34 | 1.777167
  35 | 3.561755
  36 | 1.925535
  37 | 3.776067
  38 | 3.681730
  39 | 3.906468
  40 | 4.478240
  41 | 3.819809
  42 | 4.818075
  43 | 3.343888
  44 | 4.562448
  45 | 4.107696
  46 | 4.281418
  47 | 4.385021
  48 | 3.852263
  49 | 4.423036
  50 | 3.368443
  51 | 4.099853
  52 | 3.042083
  53 | 3.958693
  54 | 3.565306
  55 | 3.732977
  56 | 3.867493
  57 | 3.272170
  58 | 3.360421
  59 | 1.156600
  60 | 0.759063
  61 | 1.108485
  62 | 1.855325
  63 | 0.944549
  64 | 0.151524
  65 | 1.151040
  66 | 2.187678
  67 | 1.886246
  68 | 1.227258
  69 | 1.210209
  70 | 0.996908
  71 | 1.328644
  72 | 1.449428
  73 | 0.678474
  74 | 0.955562
  75 | 1.199524
  76 | 0.960568
  77 | 1.825083
  78 | 1.474459
  79 | 1.370983
  80 | 2.237950
  81 | 1.050045
  82 | 0.663436
  83 | 0.000000
  84 | 0.000000
  85 | 0.000000
  86 | 0.000000
  87 | 0.000000
  88 | 0.000000
  89 | 0.000000
  90 | 0.000000
  91 | 0.000000
  92 | 0.000000
  93 | 0.000000
  94 | 0.000000
  95 | 0.000000
  96 | 0.000000
  97 | 0.000000
  98 | 0.000000
  99 | 0.000000
 100 | 1.478239
 101 | 1.546286
 102 | 1.700638
 103 | 2.801465
 104 | 1.733314
 105 | 2.710384
 106 | 2.648619
 107 | 2.734961
 108 | 3.140087
 109 | 2.700942
 110 | 2.953761
 111 | 3.057257
 112 | 3.171422
 113 | 3.396051
 114 | 2.907187
 115 | 3.243408
 116 | 3.285299
 117 | 3.328564
 118 | 3.542964
 119 | 3.170438
 120 | 3.374711
 121 | 3.380003
 122 | 3.405784
 123 | 3.664868
 124 | 3.194730
 125 | 3.373873
 126 | 3.339097
 127 | 3.348929
 128 | 3.664257
 129 | 3.033946
 130 | 3.199356
 131 | 2.981475
 132 | 3.129450
 133 | 3.305681
 134 | 2.382595
 135 | 2.821280
 136 | 2.492983
 137 | 2.212874
 138 | 2.455352
 139 | 1.200645
 140 | 2.686173
 141 | 0.352226
 142 | 1.228576
 143 | 0.000000
 144 | 0.000000
 145 | 0.000000
 146 | 0.000000
 147 | 0.000000
 148 | 0.000000
 149 | 0.000000
 150 | 0.000000
 151 | 1.052022
 152 | 2.044547
 153 | 0.645529
 154 | 1.435122
 155 | 0.679301
 156 | 0.467676
 157 | 0.355657
 158 | 0.225350
 159 | 0.710659
 160 | 0.000000
 161 | 0.000000
 162 | 0.000000
 163 | 0.000000
 164 | 0.000000
 165 | 0.000000
 166 | 0.000000
 167 | 0.000000
 168 | 0.000000
 169 | 0.000000
 170 | 0.000000
 171 | 0.000000
 172 | 0.000000
 173 | 0.000000
 174 | 0.000000
 175 | 0.000000
 176 | 0.000000
 177 | 0.000000
 178 | 0.000000
 179 | 0.000000
 180 | 0.000000
 181 | 1.126354
 182 | 2.946321
 183 | 3.527620
 184 | 3.724596
 185 | 4.525799
 186 | 4.098446
 187 | 4.737467
 188 | 4.463562
 189 | 4.906339
 190 | 4.547538
 191 | 4.970280
 192 | 4.600758
 193 | 4.875367
 194 | 4.500842
 195 | 4.581512
 196 | 4.253643
 197 | 3.736142
 198 | 3.043051
 199 | 2.627129
 200 | 1.659920
 201 | 0.471276
 202 | 0.000000
 203 | 0.000000
 204 | 0.000000
 205 | 0.000000
 206 | 0.000000
 207 | 0.000000
 208 | 0.000000
 209 | 0.000000
 210 | 0.000000
 211 | 0.000000
 212 | 0.000000
 213 | 0.000000
 214 | 0.000000
 215 | 0.000000
 216 | 0.000000
 217 | 0.000000
 218 | 0.000000
 219 | 0.000000
 220 | 1.454111
 221 | 2.761900
 222 | 3.482584
 223 | 3.663216
 224 | 3.776409
 225 | 4.092366
 226 | 3.808419
 227 | 4.397982
 228 | 4.047201
 229 | 4.215184
 230 | 4.320757
 231 | 4.211600
 232 | 4.614294
 233 | 4.338661
 234 | 4.532569
 235 | 4.505702
 236 | 4.654548
 237 | 4.700429
 238 | 4.479978
 239 | 4.862856
 240 | 4.891400
 241 | 4.545625
 242 | 4.875461
 243 | 4.598618
 244 | 4.877869
 245 | 5.415682
 246 | 4.847926
 247 | 5.142485
 248 | 0.358510
 249 | 1.133771
 250 | 1.621816
 251 | 1.978592
 252 | 1.586406
 253 | 0.000000
 254 | 0.000000
 255 | 0.000000
 256 | 0.000000
 257 | 0.000000
 258 | 0.000000
 259 | 0.000000
 260 | 1.397653
 261 | 3.022556
 262 | 4.605976
 263 | 4.519685
 264 | 5.112604
 265 | 5.070884
 266 | 4.822219
 267 | 5.194060
 268 | 5.100616
 269 | 5.214362
 270 | 5.282988
 271 | 5.034066
 272 | 5.440768
 273 | 5.315097
 274 | 5.523714
 275 | 5.854498
 276 | 6.243612
 277 | 6.579317
 278 | 6.908676
 279 | 6.649100
 280 | 7.230005
 281 | 6.255204
 282 | 7.024633
 283 | 6.355053
 284 | 6.570358
 285 | 7.019010
 286 | 6.505511
 287 | 7.000665
 288 | 5.982217
 289 | 6.490695
 290 | 5.754432
 291 | 4.211986
 292 | 3.749975
 293 | 3.145122
 294 | 3.065743
 295 | 2.058786
 296 | 2.553616
 297 | 1.179549
 298 | 1.858048
 299 | 0.555788
 300 | 1.589719
 301 | 0.000000
 302 | 0.000000
 303 | 0.000000
 304 | 0.000000
 305 | 0.000000
 306 | 0.000000
 307 | 0.000000
 308 | 0.000000
 309 | 0.000000
 310 | 0.000000
 311 | 0.000000
 312 | 0.000000
 313 | 0.000000
 314 | 0.000000
 315 | 0.000000
 316 | 0.000000
 317 | 0.000000
 318 | 0.000000
 319 | 0.000000
 320 | 0.000000
 321 | 0.000000
 322 | 2.107807
 323 | 2.054843
 324 | 2.953910
 325 | 2.741811
 326 | 3.327898
 327 | 3.210654
 328 | 3.805440
 329 | 3.659808
 330 | 4.038557
 331 | 4.231474
 332 | 4.466258
 333 | 5.329964
 334 | 5.229870
 335 | 5.862393
 336 | 5.286558
 337 | 6.023390
 338 | 5.300958
 339 | 5.396051
 340 | 5.363367
 341 | 5.060120
 342 | 5.593807
 343 | 4.103935
 344 | 5.839395
 345 | 4.820440
 346 | 5.375541
 347 | 5.862968
 348 | 4.752380
 349 | 5.800714
 350 | 4.937072
 351 | 5.117013
 352 | 5.825937
 353 | 4.707493
 354 | 5.767327
 355 | 5.416033
 356 | 5.230058
 357 | 5.919025
 358 | 4.869921
 359 | 5.253999
 360 | 5.601425
 361 | 4.022416
 362 | 5.189520
 363 | 5.042419
 364 | 3.951378
 365 | 4.287358
 366 | 0.000000
 367 | 0.000000
 368 | 0.000000
 369 | 0.000000
 370 | 0.000000
 371 | 0.000000
 372 | 0.389709
 373 | 0.748957
 374 | 1.350006
 375 | 1.184731
 376 | 1.094279
 377 | 0.808653
 378 | 0.000000
 379 | 0.000000
 380 | 0.000000
 381 | 0.000000
 382 | 0.000000
 383 | 0.000000
 384 | 0.000000
 385 | 0.000000
 386 | 0.000000
 387 | 0.000000
 388 | 0.000000
 389 | 0.000000
 390 | 0.000000
 391 | 0.000000
 392 | 0.000000
 393 | 0.000000
 394 | 0.000000
 395 | 0.000000
 396 | 0.000000
 397 | 0.000000
 398 | 0.000000
 399 | 0.000000
 400 | 0.000000
 401 | 0.000000
 402 | 0.000000
 403 | 0.000000
 404 | 0.935910
 405 | 1.758160
 406 | 1.628309
 407 | 2.411204
 408 | 2.960818
 409 | 2.316287
 410 | 3.308113
 411 | 2.871104
 412 | 2.474566
 413 | 3.774521
 414 | 2.134365
 415 | 3.296041
 416 | 3.435712
 417 | 2.650713
 418 | 3.060803
 419 | 2.601130
 420 | 1.704484
 421 | 2.454674
 422 | 0.000000
 423 | 0.000000
 424 | 0.000000
 425 | 0.000000
 426 | 0.000000
 427 | 0.000000
 428 | 0.000000
 429 | 0.000000
 430 | 0.000000
 431 | 0.000000
 432 | 0.000000
 433 | 0.000000
 434 | 0.000000
 435 | 0.687932
 436 | 0.926301
 437 | 2.354633
 438 | 3.880248
 439 | 2.669980
 440 | 4.352959
 441 | 5.321133
 442 | 4.539225
 443 | 5.643332
 444 | 5.086752
 445 | 5.216498
 446 | 6.131239
 447 | 4.408493
 448 | 5.594543
 449 | 5.524803
 450 | 4.460885
 451 | 5.863668
 452 | 4.405786
 453 | 4.323978
 454 | 4.688123
 455 | 3.702257
 456 | 3.546547
 457 | 2.118505
 458 | 0.914533
 459 | 0.000000
 460 | 0.000000
 461 | 0.000000
 462 | 0.000000
 463 | 0.000000
 464 | 0.000000
 465 | 0.000000
 466 | 0.000000
 467 | 0.000000
 468 | 0.000000
 469 | 0.000000
 470 | 0.000000
 471 | 0.000000
 472 | 0.000000
 473 | 0.000000
 474 | 0.000000
 475 | 0.000000
 476 | 0.000000
 477 | 0.000000
 478 | 0.000000
 479 | 0.000000
 480 | 0.000000
 481 | 0.000000
 482 | 0.000000
 483 | 0.000000
 484 | 0.000000
 485 | 0.000000
 486 | 0.000000
 487 | 0.000000
 488 | 0.000000
 489 | 0.000000
 490 | 0.000000
 491 | 0.000000
 492 | 0.000000
 493 | 0.000000
 494 | 0.000000
 495 | 0.000000
 496 | 0.000000
 497 | 0.000000
 498 | 0.000000
 499 | 0.000000
 500 | 0.000000
 501 | 0.000000
 502 | 0.000000
 503 | 0.000000
 504 | 0.000000
 505 | 0.000000
 506 | 0.000000
 507 | 0.000000
 508 | 0.000000
 509 | 0.000000
 510 | 0.000000
 511 | 0.000000
 512 | 0.000000
 513 | 0.000000
 514 | 0.000000
 515 | 0.000000
 516 | 0.000000
 517 | 0.000000
 518 | 0.000000
 519 | 0.000000
 520 | 0.000000
 521 | 0.000000
 522 | 0.000000
 523 | 0.000000
 524 | 0.000000
 525 | 0.000000
 526 | 0.000000
 527 | 0.000000
 528 | 0.000000
 529 | 0.000000
 530 | 0.000000
 531 | 0.000000
 532 | 0.000000
 533 | 0.000000
 534 | 0.000000
 535 | 0.000000
 536 | 0.000000
 537 | 0.000000
 538 | 0.000000
 539 | 0.000000
 540 | 0.000000
 541 | 0.000000
 542 | 0.000000
 543 | 0.000000
 544 | 0.000000
 545 | 0.000000
 546 | 0.000000
 547 | 0.000000
 548 | 0.000000
 549 | 0.000000
 550 | 0.000000
 551 | 0.000000
 552 | 0.000000
 553 | 0.000000
 554 | 0.000000
 555 | 0.000000
 556 | 0.000000
 557 | 0.000000
 558 | 0.000000
 559 | 0.000000
 560 | 0.000000
 561 | 0.000000
 562 | 0.000000
 563 | 0.000000
 564 | 0.000000
 565 | 0.000000
 566 | 0.000000
 567 | 0.000000
 568 | 0.000000
 569 | 0.000000
 570 | 0.000000
 571 | 0.000000
 572 | 0.000000
 573 | 0.000000
 574 | 0.000000
 575 | 0.000000
 576 | 0.000000
 577 | 0.000000
 578 | 0.000000
 579 | 0.000000
 580 | 0.000000
 581 | 0.000000
 582 | 0.000000
 583 | 0.000000
 584 | 0.000000
 585 | 0.000000
 586 | 0.000000
 587 | 0.000000
 588 | 0.000000
 589 | 0.000000
 590 | 0.000000
 591 | 0.000000
 592 | 0.000000
 593 | 0.000000
 594 | 0.000000
 595 | 0.000000
 596 | 0.000000
 597 | 0.000000
 598 | 0.000000
 599 | 0.000000
 600 | 0.000000
 601 | 0.000000
 602 | 0.000000
 603 | 0.000000
 604 | 0.000000
 605 | 0.000000
 606 | 0.000000
 607 | 0.000000
 608 | 0.000000
 609 | 0.000000
 610 | 0.000000
 611 | 0.000000
 612 | 0.000000
 613 | 0.000000
 614 | 0.000000
 615 | 0.000000
 616 | 0.000000
 617 | 0.000000
 618 | 0.000000
 619 | 0.000000
 620 | 0.000000
 621 | 0.000000
 622 | 0.000000
 623 | 0.000000
 624 | 0.000000
 625 | 0.000000
 626 | 0.000000
 627 | 0.000000
 628 | 0.000000
 629 | 0.000000
 630 | 0.000000
 631 | 0.000000
 632 | 0.000000
 633 | 0.000000
 634 | 0.000000
 635 | 0.000000
 636 | 0.000000
 637 | 0.000000
 638 | 0.000000
 639 | 0.000000
 640 | 0.000000
 641 | 0.000000
 642 | 0.000000
 643 | 0.000000
 644 | 0.000000
 645 | 0.000000
 646 | 0.000000
 647 | 0.000000
 648 | 0.000000
 649 | 0.000000
 650 | 0.000000
 651 | 0.000000
 652 | 0.000000
 653 | 0.000000
 654 | 0.000000
 655 | 0.000000
 656 | 0.000000
 657 | 0.000000
 658 | 0.000000
 659 | 0.000000
 660 | 0.000000
 661 | 0.000000
 662 | 0.000000
 663 | 0.000000
 664 | 0.000000
 665 | 0.000000
 666 | 0.000000
 667 | 0.000000
 668 | 0.000000
 669 | 0.000000
 670 | 0.000000
 671 | 0.000000
 672 | 0.000000
 673 | 0.000000
 674 | 0.000000
 675 | 0.000000
 676 | 0.000000
 677 | 0.000000
 678 | 0.000000
 679 | 0.000000
 680 | 0.000000
 681 | 0.000000
 682 | 0.000000
 683 | 0.000000
 684 | 0.000000
 685 | 0.000000
 686 | 0.000000
 687 | 0.000000
 688 | 0.000000
 689 | 0.000000
 690 | 0.000000
 691 | 0.000000
 692 | 0.000000
 693 | 0.000000
 694 | 0.000000
 695 | 0.000000
 696 | 0.000000
 697 | 0.000000
 698 | 0.000000
 699 | 0.000000
 700 | 0.000000
 701 | 0.000000
 702 | 0.000000
 703 | 0.000000
 704 | 0.000000
 705 | 0.000000
 706 | 0.000000
 707 | 0.000000
 708 | 0.000000
 709 | 0.000000
 710 | 0.000000
 711 | 0.000000
 712 | 0.000000
 713 | 0.000000
 714 | 0.000000
 715 | 0.000000
 716 | 0.000000
 717 | 0.000000
 718 | 0.000000
 719 | 0.000000
 720 | 0.000000
 721 | 0.000000
 722 | 0.000000
 723 | 0.000000
 724 | 0.000000
 725 | 0.000000
 726 | 0.000000
 727 | 0.000000
 728 | 0.000000
 729 | 0.000000
 730 | 0.000000
 731 | 0.000000
 732 | 0.000000
 733 | 0.000000
 734 | 0.000000
 735 | 0.000000
 736 | 0.000000
 737 | 0.000000
 738 | 0.000000
 739 | 0.000000
 740 | 0.000000
 741 | 0.000000
 742 | 0.000000
 743 | 0.000000
 744 | 0.000000
 745 | 0.000000
 746 | 0.000000
 747 | 0.000000
 748 | 0.000000
 749 | 0.000000
 750 | 0.000000
 751 | 0.000000
 752 | 0.000000
 753 | 0.000000
 754 | 0.000000
 755 | 0.000000
 756 | 0.000000
 757 | 0.000000
 758 | 0.000000
 759 | 0.000000
 760 | 0.000000
 761 | 0.000000
 762 | 0.000000
 763 | 0.000000
 764 | 0.000000
 765 | 0.000000
 766 | 0.000000
 767 | 0.000000
 768 | 0.000000
 769 | 0.000000
 770 | 0.000000
 771 | 0.000000
 772 | 0.000000
 773 | 0.000000
 774 | 0.000000
 775 | 0.000000
 776 | 0.000000
 777 | 0.000000
 778 | 0.000000
 779 | 0.000000
 780 | 0.000000
 781 | 0.000000
 782 | 0.000000
 783 | 0.000000
 784 | 0.000000
 785 | 0.000000
 786 | 0.000000
 787 | 0.000000
 788 | 0.000000
 789 | 0.000000
 790 | 0.000000
 791 | 0.000000
 792 | 0.000000
 793 | 0.000000
 794 | 0.000000
 795 | 0.000000
 796 | 0.000000
 797 | 0.000000
 798 | 0.000000
 799 | 0.000000
 800 | 0.000000
 801 | 0.000000
 802 | 0.000000
 803 | 0.000000
 804 | 0.000000
 805 | 0.000000
 806 | 0.000000
 807 | 0.000000
 808 | 0.000000
 809 | 0.000000
 810 | 0.000000
 811 | 0.000000
 812 | 0.000000
 813 | 0.000000
 814 | 0.000000
 815 | 0.000000
 816 | 0.000000
 817 | 0.000000
 818 | 0.000000
 819 | 0.000000
 820 | 0.000000
 821 | 0.000000
 822 | 0.000000
 823 | 0.000000
 824 | 0.000000
 825 | 0.000000
 826 | 0.000000
 827 | 0.000000
 828 | 0.000000
 829 | 0.000000
 830 | 0.000000
 831 | 0.000000
 832 | 0.000000
 833 | 0.000000
 834 | 0.000000
 835 | 0.000000
 836 | 0.000000
 837 | 0.000000
 838 | 0.000000
 839 | 0.000000
 840 | 0.000000
 841 | 0.000000
 842 | 0.000000
 843 | 0.000000
 844 | 0.000000
 845 | 0.000000
 846 | 0.000000
 847 | 0.000000
 848 | 0.000000
 849 | 0.000000
 850 | 0.000000
 851 | 0.000000
 852 | 0.000000
 853 | 0.000000
 854 | 0.000000
 855 | 0.000000
 856 | 0.000000
 857 | 0.000000
 858 | 0.000000
 859 | 0.000000
 860 | 0.000000
 861 | 0.000000
 862 | 0.000000
 863 | 0.000000
 864 | 0.000000
 865 | 0.000000
 866 | 0.000000
 867 | 0.000000
 868 | 0.000000
 869 | 0.000000
 870 | 0.000000
 871 | 0.000000
 872 | 0.000000
 873 | 0.000000
 874 | 0.000000
 875 | 0.000000
 876 | 0.000000
 877 | 0.000000
 878 | 0.000000
 879 | 0.000000
 880 | 0.000000
 881 | 0.000000
 882 | 0.000000
 883 | 0.000000
 884 | 0.000000
 885 | 0.000000
 886 | 0.000000
 887 | 0.000000
 888 | 0.000000
 889 | 0.000000
 890 | 0.000000
 891 | 0.000000
 892 | 0.000000
 893 | 0.000000
 894 | 0.000000
 895 | 0.000000
 896 | 0.000000
 897 | 0.000000
 898 | 0.000000
 899 | 0.000000
 900 | 0.000000
 901 | 0.000000
 902 | 0.000000
 903 | 0.000000
 904 | 0.000000
 905 | 0.000000
 906 | 0.000000
 907 | 0.000000
 908 | 0.000000
 909 | 0.000000
 910 | 0.000000
 911 | 0.000000
 912 | 0.000000
 913 | 0.000000
 914 | 0.000000
 915 | 0.000000
 916 | 0.000000
 917 | 0.000000
 918 | 0.000000
 919 | 0.000000
 920 | 0.000000
 921 | 0.000000
 922 | 0.000000
 923 | 0.000000
 924 | 0.000000
 925 | 0.000000
 926 | 0.000000
 927 | 0.000000
 928 | 0.000000
 929 | 0.000000
 930 | 0.000000
 931 | 0.000000
 932 | 0.000000
 933 | 0.000000
 934 | 0.000000
 935 | 0.000000
 936 | 0.000000
 937 | 0.000000
 938 | 0.000000
 939 | 0.000000
 940 | 0.000000
 941 | 0.000000
 942 | 0.000000
 943 | 0.000000
 944 | 0.000000
 945 | 0.000000
 946 | 0.000000
 947 | 0.000000
 948 | 0.000000
 949 | 0.000000
 950 | 0.000000
 951 | 0.000000
 952 | 0.000000
 953 | 0.000000
 954 | 0.000000
 955 | 0.000000
 956 | 0.000000
 957 | 0.000000
 958 | 0.000000
 959 | 0.000000
 960 | 0.000000
 961 | 0.000000
 962 | 0.000000
 963 | 0.000000
 964 | 0.000000
 965 | 0.000000
 966 | 0.000000
 967 | 0.000000
 968 | 0.000000
 969 | 0.000000
 970 | 0.000000
 971 | 0.000000
 972 | 0.000000
 973 | 0.000000
 974 | 0.000000
 975 | 0.000000
 976 | 0.000000
 977 | 0.000000
 978 | 0.000000
 979 | 0.000000
 980 | 0.000000
 981 | 0.000000
 982 | 0.000000
 983 | 0.000000
 984 | 0.000000
 985 | 0.000000
 986 | 0.000000
 987 | 0.000000
 988 | 0.000000
 989 | 0.000000
 990 | 0.000000
 991 | 0.000000
 992 | 0.000000
 993 | 0.000000
 994 | 0.000000
 995 | 0.000000
 996 | 0.000000
 997 | 0.000000
 998 | 0.000000
 999 | 0.000000
1000 | 0.000000
1001 | 0.000000
1002 | 0.000000
1003 | 0.000000
1004 | 0.000000
1005 | 0.000000
1006 | 0.000000
1007 | 0.000000
1008 | 0.000000
1009 | 0.000000
1010 | 0.000000
1011 | 0.000000
1012 | 0.000000
1013 | 0.000000
1014 | 0.000000
1015 | 0.000000
1016 | 0.000000
1017 | 0.000000
1018 | 0.000000
1019 | 0.000000
1020 | 0.000000
1021 | 0.000000
1022 | 0.000000
1023 | 0.000000
1024 | 0.000000
1025 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/run.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | 
3 | # Copyright 2017
4 | # Author: Chenglin Xu (NTU, Singapore)
5 | # Email: xuchenglin28@gmail.com
6 | # Updated by Chenglin, Dec 2018
7 | 
8 | /export/home/clx214/Matlab_R2014A/bin/matlab -nodesktop -nosplash -r "eval_sdr('tt', 0, 0, 'Ext_mfcc_Mix_N256_L20_1L80_2L160_S10_B256_H512_P3_X8_R4_C2_gln_si-sdr_sigmoid_deconv_BLSTM_e400_spk0.2_mscmo_a0.1_b0.1', 'mix', 's1')"
9 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/spk2gender:
--------------------------------------------------------------------------------
 1 | 050 0
 2 | 051 1
 3 | 052 1
 4 | 053 0
 5 | 22g 1
 6 | 22h 1
 7 | 420 0 
 8 | 421 0 
 9 | 422 1
10 | 423 1
11 | 440 1
12 | 441 0
13 | 442 0
14 | 443 1
15 | 444 0
16 | 445 0
17 | 446 1
18 | 447 1
19 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/spk2gender_cv:
--------------------------------------------------------------------------------
  1 | 011 0
  2 | 012 1
  3 | 013 1
  4 | 014 0
  5 | 015 1
  6 | 016 0
  7 | 017 0
  8 | 018 0
  9 | 019 0
 10 | 01a 0
 11 | 01b 0
 12 | 01c 0
 13 | 01d 0
 14 | 01e 1
 15 | 01f 0
 16 | 01g 1
 17 | 01i 1
 18 | 01j 0
 19 | 01k 0
 20 | 01l 1
 21 | 01m 0
 22 | 01n 0
 23 | 01o 0
 24 | 01p 0
 25 | 01q 0
 26 | 01r 1
 27 | 01s 1
 28 | 01t 1
 29 | 01u 0
 30 | 01v 0
 31 | 01w 1
 32 | 01x 0
 33 | 01y 1
 34 | 01z 1
 35 | 020 1
 36 | 021 1
 37 | 022 0
 38 | 023 0
 39 | 024 1
 40 | 025 1
 41 | 026 1
 42 | 027 0
 43 | 028 0
 44 | 029 1
 45 | 02a 0
 46 | 02b 1
 47 | 02c 0
 48 | 02d 0
 49 | 02e 0
 50 | 204 0
 51 | 205 0
 52 | 206 0
 53 | 207 1
 54 | 208 1
 55 | 209 0
 56 | 20a 0
 57 | 20b 0
 58 | 20c 1
 59 | 20d 0
 60 | 20e 0
 61 | 20f 1
 62 | 20g 1
 63 | 20h 0
 64 | 20i 1
 65 | 20j 1
 66 | 20k 1
 67 | 20l 1
 68 | 20m 1
 69 | 20n 1
 70 | 20o 1
 71 | 20p 0
 72 | 20q 1
 73 | 20r 1
 74 | 20s 1
 75 | 20t 0
 76 | 20u 1
 77 | 20v 1
 78 | 401 0
 79 | 403 1
 80 | 404 0
 81 | 405 1
 82 | 406 1
 83 | 407 0
 84 | 408 1
 85 | 409 0
 86 | 40a 1
 87 | 40b 1
 88 | 40c 1
 89 | 40d 1
 90 | 40e 0
 91 | 40f 1
 92 | 40g 0
 93 | 40h 0
 94 | 40i 1
 95 | 40j 1
 96 | 40k 1
 97 | 40l 0
 98 | 40m 0
 99 | 40n 1
100 | 40o 0
101 | 40p 0
102 | 


--------------------------------------------------------------------------------
/evaluation/sdr_pesq_sisdr/spk2gender_tr:
--------------------------------------------------------------------------------
  1 | 011
  2 | 012
  3 | 013
  4 | 014
  5 | 015
  6 | 016
  7 | 017
  8 | 018
  9 | 019
 10 | 01a
 11 | 01b
 12 | 01c
 13 | 01d
 14 | 01e
 15 | 01f
 16 | 01g
 17 | 01i
 18 | 01j
 19 | 01k
 20 | 01l
 21 | 01m
 22 | 01n
 23 | 01o
 24 | 01p
 25 | 01q
 26 | 01r
 27 | 01s
 28 | 01t
 29 | 01u
 30 | 01v
 31 | 01w
 32 | 01x
 33 | 01y
 34 | 01z
 35 | 020
 36 | 021
 37 | 022
 38 | 023
 39 | 024
 40 | 025
 41 | 026
 42 | 027
 43 | 028
 44 | 029
 45 | 02a
 46 | 02b
 47 | 02c
 48 | 02d
 49 | 02e
 50 | 204
 51 | 205
 52 | 206
 53 | 207
 54 | 208
 55 | 209
 56 | 20a
 57 | 20b
 58 | 20c
 59 | 20d
 60 | 20e
 61 | 20f
 62 | 20g
 63 | 20h
 64 | 20i
 65 | 20j
 66 | 20k
 67 | 20l
 68 | 20m
 69 | 20n
 70 | 20o
 71 | 20p
 72 | 20q
 73 | 20r
 74 | 20s
 75 | 20t
 76 | 20u
 77 | 20v
 78 | 401
 79 | 403
 80 | 404
 81 | 405
 82 | 406
 83 | 407
 84 | 408
 85 | 409
 86 | 40a
 87 | 40b
 88 | 40c
 89 | 40d
 90 | 40e
 91 | 40f
 92 | 40g
 93 | 40h
 94 | 40i
 95 | 40j
 96 | 40k
 97 | 40l
 98 | 40m
 99 | 40n
100 | 40o
101 | 40p
102 | 


--------------------------------------------------------------------------------
/generation/WHAM_and_WHAMR/wham_scripts.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/WHAM_and_WHAMR/wham_scripts.tar.gz


--------------------------------------------------------------------------------
/generation/WHAM_and_WHAMR/whamr_scripts.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/WHAM_and_WHAMR/whamr_scripts.tar.gz


--------------------------------------------------------------------------------
/generation/wsj0-2mix-extr/simulate_2spk_mix.m:
--------------------------------------------------------------------------------
  1 | function simulate_2spk_mix(data_type, wsj0root, output_dir, fs8k, min_max)
  2 | % Simulate 2-speaker mixture data for speaker extraction.
  3 | % Call:
  4 | %     simulate_2spk_mix(data_type, wsj0root, output_dir, fs8k, min_max)
  5 | %     e.g., simulate_2spk_mix('tt', '/media/clx214/data/wsj/', '/media/clx214/data/wsj0_2mix_extr_tmp/wav8k', 8000, 'max')
  6 | % Paras:
  7 | %     data_type: data set to generate, (tr|cv|tt), e.g., 'tt'
  8 | %     wsj0root: YOUR_PATH/, the folder containing converted wsj0/, e.g., '/media/clx214/data/wsj/'
  9 | %     output_dir: the folder to save simulated data for extraction, e.g., '/media/clx214/data/wsj0_2mix_extr_tmp/wav8k'
 10 | %     fs8k: sampling rate of the simulated data, e.g., 8000
 11 | %     min_max: get the mininium or maximum wav length, when simulating mixture data, e.g, 'max'
 12 | %
 13 | % The code is based on "create_wav_2speakers_extr.m" from "http://www.merl.com/demos/deep-clustering"
 14 | %
 15 | % 1. Assume that WSJ0's wv1 sphere files is converted to wav files. The folder
 16 | %    structure and file name are kept same under wsj0/, e.g.,
 17 | %    ORG_PATH/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and
 18 | %    stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wv1.
 19 | %    Relevant data ('si_tr_s', 'si_dt_05' and 'si_et_05') are under YOUR_PATH/wsj0/
 20 | % 2. Put 'voicebox' toolbox in current folder. (http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html)
 21 | % 3. Set your 'YOUR_PATH' and 'OUTPUT_PATH' properly, then run this script in Matlab.
 22 | %    (The max lenght of the wav will be kept when generate the mixture. The sampling rate will be 8kHz.)
 23 | %
 24 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 25 | %   Copyright (C) 2016 Mitsubishi Electric Research Labs
 26 | %                          (Jonathan Le Roux, John R. Hershey, Zhuo Chen)
 27 | %   Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
 28 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 29 | %
 30 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 31 | %Copyright 2018 Chenglin Xu, Nanyang Technological University, Singapore
 32 | %
 33 | %Licensed under the Apache License, Version 2.0 (the "License");
 34 | %you may not use this file except in compliance with the License.
 35 | %You may obtain a copy of the License at
 36 | %
 37 | %    http://www.apache.org/licenses/LICENSE-2.0
 38 | %
 39 | %Unless required by applicable law or agreed to in writing, software
 40 | %distributed under the License is distributed on an "AS IS" BASIS,
 41 | %WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 42 | %See the License for the specific language governing permissions and
 43 | %limitations under the License.
 44 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 45 | 
 46 | if ~exist([output_dir '/' min_max '/' data_type],'dir')
 47 |     mkdir([output_dir '/' min_max '/' data_type]);
 48 | end
 49 | mkdir([output_dir  '/' min_max '/' data_type '/s1/']);
 50 | mkdir([output_dir  '/' min_max '/' data_type '/aux/']);
 51 | mkdir([output_dir  '/' min_max '/' data_type '/mix/']);
 52 | 
 53 | TaskFile = ['mix_2_spk_' data_type '_extr.txt'];
 54 | fid = fopen(TaskFile,'r');
 55 | C = textscan(fid,'%s %f %s %f %s');
 56 | num_files = length(C{1});
 57 | 
 58 | fprintf(1,'Start to generate data for %s\n', [min_max '_' data_type]);
 59 | for i = 1:num_files
 60 |     [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i});
 61 |     [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i});
 62 |     [inwav_aux_dir,invwav_aux_name,inwav_aux_ext] = fileparts(C{5}{i});
 63 |     
 64 |     inwav1_snr = C{2}(i);
 65 |     inwav2_snr = C{4}(i);
 66 |     mix_name = [invwav1_name,'_',num2str(inwav1_snr),'_',invwav2_name,'_',num2str(inwav2_snr),'_',invwav_aux_name];
 67 |     
 68 |     % get input wavs
 69 |     [s1, fs] = audioread([wsj0root C{1}{i}]);
 70 |     s2       = audioread([wsj0root C{3}{i}]);
 71 |     s_aux    = audioread([wsj0root C{5}{i}]);
 72 |     
 73 |     % resample, normalize to 8 kHz file
 74 |     s1_8k = resample(s1,fs8k,fs);
 75 |     [s1_8k,lev1] = activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev);
 76 |     s2_8k = resample(s2,fs8k,fs);
 77 |     [s2_8k,lev2] = activlev(s2_8k,fs8k,'n');
 78 |     s_aux_8k = resample(s_aux,fs8k,fs);
 79 |     [s_aux_8k,lev_aux] = activlev(s_aux_8k,fs8k,'n');
 80 |     
 81 |     weight_1 = 10^(inwav1_snr/20);
 82 |     weight_2 = 10^(inwav2_snr/20);
 83 |     
 84 |     s1_8k = weight_1 * s1_8k;
 85 |     s2_8k = weight_2 * s2_8k;
 86 |     
 87 |     switch min_max
 88 |         case 'max'
 89 |             mix_8k_length = max(length(s1_8k),length(s2_8k));
 90 |             s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1));
 91 |             s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1));
 92 |         case 'min'
 93 |             mix_8k_length = min(length(s1_8k),length(s2_8k));
 94 |             s1_8k = s1_8k(1:mix_8k_length);
 95 |             s2_8k = s2_8k(1:mix_8k_length);
 96 |     end
 97 |     mix_8k = s1_8k + s2_8k;
 98 |     
 99 |     max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:)),abs(s_aux_8k(:))));
100 |     mix_scaling_8k = 1/max_amp_8k*0.9;
101 |     s1_8k = mix_scaling_8k * s1_8k;
102 |     mix_8k = mix_scaling_8k * mix_8k;
103 |     s_aux_8k = mix_scaling_8k * s_aux_8k;
104 |     
105 |     audiowrite([output_dir '/' min_max '/' data_type '/s1/' mix_name '.wav'],s1_8k,fs8k);
106 |     audiowrite([output_dir '/' min_max '/' data_type '/aux/' mix_name '.wav'],s_aux_8k,fs8k);
107 |     audiowrite([output_dir '/' min_max '/' data_type '/mix/' mix_name '.wav'],mix_8k,fs8k);
108 | end
109 | fclose(fid);
110 | fprintf(1,'End of generating data for %s\n', [min_max '_' data_type]);
111 | end
112 | 


--------------------------------------------------------------------------------
/generation/wsj0-2mix/create-speaker-mixtures.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/wsj0-2mix/create-speaker-mixtures.zip


--------------------------------------------------------------------------------
/generation/wsj0-2mix/spatialize_wsj0-mix.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/generation/wsj0-2mix/spatialize_wsj0-mix.zip


--------------------------------------------------------------------------------
/slides/AVSS_Datasets_PanZexu.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/AVSS_Datasets_PanZexu.pdf


--------------------------------------------------------------------------------
/slides/Advances_in_end-to-end_neural_source_separation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/Advances_in_end-to-end_neural_source_separation.pdf


--------------------------------------------------------------------------------
/slides/DeLiangWang_ASRU19.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/DeLiangWang_ASRU19.pdf


--------------------------------------------------------------------------------
/slides/HaizhouLi_CCF.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/HaizhouLi_CCF.pdf


--------------------------------------------------------------------------------
/slides/Speech-Separation-Dataset-GM.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/Speech-Separation-Dataset-GM.pdf


--------------------------------------------------------------------------------
/slides/overview-GM.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gemengtju/Tutorial_Separation/c47dca746fa2e123868f5fbf2aed70ab3809925b/slides/overview-GM.pdf


--------------------------------------------------------------------------------