├── LICENSE ├── README.md └── overview.svg /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | , 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Speech Enhancement 2 | 3 | This repository summarizes the papers, codes, and tools for single-/multi-channel speech enhancement/speech separation. Welcome to pull requests. 4 | 9 | 10 | ## Contents 11 | - [Speech_Enhancement](#Speech_Enhancement) 12 | - [Dereverberation](#Dereverberation) 13 | - [Speech_Seperation](#Speech_Seperation) 14 | - [Array_Signal_Processing](#Array_Signal_Processing) 15 | - [Tools](#Tools) 16 | - [Books](#Books) 17 | - [Resources](#Resources) 18 | 19 | ## Speech_Enhancement 20 | 21 | ![alt Speech Enhancement Tree](https://github.com/WenzheLiu-Speech/awesome-speech-enhancement/blob/master/overview.svg) 22 | 23 | ### Magnitude spectrogram 24 | #### spectral masking 25 | * 2014, On Training Targets for Supervised Speech Separation, Wang. [[Paper]](https://ieeexplore.ieee.org/document/6887314) 26 | * 2018, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, [Valin](https://github.com/jmvalin). [[Paper]](https://ieeexplore.ieee.org/document/8547084/) [[RNNoise]](https://github.com/xiph/rnnoise) [[RNNoise16k]](https://github.com/YongyuG/rnnoise_16k) 27 | * 2020, A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech, [Valin](https://github.com/jmvalin). [Paper](https://arxiv.org/abs/2008.04259) [[PercepNet]](https://github.com/jzi040941/PercepNet) 28 | * 2020, Online Monaural Speech Enhancement using Delayed Subband LSTM, Li. [[Paper]](https://arxiv.org/abs/2005.05037) 29 | * 2020, FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, [Hao](https://github.com/haoxiangsnr). [[Paper]](https://arxiv.org/pdf/2010.15508.pdf) [[FullSubNet]](https://github.com/haoxiangsnr/FullSubNet) 30 | * 2020, Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement, Xia. [[Paper]](https://www.microsoft.com/en-us/research/uploads/prod/2020/05/0000871.pdf) [[NSNet]](https://github.com/GuillaumeVW/NSNet) 31 | * 2020, RNNoise-like fixed-point model deployed on Microcontroller using NNoM inference framework [[example]](https://github.com/majianjia/nnom/tree/master/examples/rnn-denoise) [[NNoM]](https://github.com/majianjia/nnom) 32 | * 2021, RNNoise-Ex: Hybrid Speech Enhancement System based on RNN and Spectral Features. [[Paper]](https://arxiv.org/abs/2105.11813) [[RNNoise-Ex]](https://github.com/CedArctic/rnnoise-ex) 33 | * Other IRM-based SE repositories: [[IRM-SE-LSTM]](https://github.com/haoxiangsnr/IRM-based-Speech-Enhancement-using-LSTM) [[nn-irm]](https://github.com/zhaoforever/nn-irm) [[rnn-se]](https://github.com/amaas/rnn-speech-denoising) [[DL4SE]](https://github.com/miralv/Deep-Learning-for-Speech-Enhancement) 34 | 35 | #### spectral mapping 36 | * 2014, An Experimental Study on Speech Enhancement Based on Deep Neural Networks, [Xu](https://github.com/yongxuUSTC). [[Paper]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6665000) 37 | 38 | * 2014, A Regression Approach to Speech Enhancement Based on Deep Neural Networks, [Xu](https://github.com/yongxuUSTC). [[Paper]](https://ieeexplore.ieee.org/document/6932438) [[sednn]](https://github.com/yongxuUSTC/sednn) [[DNN-SE-Xu]](https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool) [[DNN-SE-Li]](https://github.com/hyli666/DNN-SpeechEnhancement) 39 | 40 | * Other DNN magnitude spectrum mapping-based SE repositories: [[SE toolkit]](https://github.com/jtkim-kaist/Speech-enhancement) [[TensorFlow-SE]](https://github.com/linan2/TensorFlow-speech-enhancement-Chinese) [[UNetSE]](https://github.com/vbelz/Speech-enhancement) 41 | 42 | * 2015, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, Weninger. [[Paper]](https://hal.inria.fr/hal-01163493/file/weninger_LVA15.pdf) 43 | 44 | * 2016, A Fully Convolutional Neural Network for Speech Enhancement, Park. [[Paper]](https://arxiv.org/abs/1609.07132) [[CNN4SE]](https://github.com/zhr1201/CNN-for-single-channel-speech-enhancement) 45 | 46 | * 2017, Long short-term memory for speaker generalizationin supervised speech separation, Chen. [[Paper]](http://web.cse.ohio-state.edu/~wang.77/papers/Chen-Wang.jasa17.pdf) 47 | 48 | * 2018, A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement, [Tan](https://github.com/JupiterEthan). [[Paper]](https://web.cse.ohio-state.edu/~wang.77/papers/Tan-Wang1.interspeech18.pdf) [[CRN-Tan]](https://github.com/JupiterEthan/CRN-causal) 49 | 50 | * 2018, Convolutional-Recurrent Neural Networks for Speech Enhancement, Zhao. [[Paper]](https://arxiv.org/pdf/1805.00579.pdf) [[CRN-Hao]](https://github.com/haoxiangsnr/A-Convolutional-Recurrent-Neural-Network-for-Real-Time-Speech-Enhancement) 51 | 52 | 53 | 54 | ### Complex domain 55 | * 2017, Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, [Fu](https://github.com/JasonSWFu). [[Paper]](https://arxiv.org/pdf/1704.08504.pdf) 56 | 57 | * 2017, Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising, Williamson. [[Paper]](https://ieeexplore.ieee.org/abstract/document/7906509) 58 | 59 | * 2019, PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network, Yin. [[Paper]](https://arxiv.org/abs/1911.04697) [[PHASEN]](https://github.com/huyanxin/phasen) 60 | 61 | * 2019, Phase-aware Speech Enhancement with Deep Complex U-Net, Choi. [[Paper]](https://arxiv.org/abs/1903.03107) [[DC-UNet]](https://github.com/chanil1218/DCUnet.pytorch) 62 | 63 | * 2020, Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural Speech Enhancement, [Tan](https://github.com/JupiterEthan). [[Paper]](https://web.cse.ohio-state.edu/~wang.77/papers/Tan-Wang.taslp20.pdf) [[GCRN]](https://github.com/JupiterEthan/GCRN-complex) 64 | 65 | * 2020, DCCRN: Deep Complex Convolution Recurrent Network for Phase-AwareSpeech Enhancement, [Hu](https://github.com/huyanxin). [[Paper]](https://isca-speech.org/archive/Interspeech_2020/pdfs/2537.pdf) [[DCCRN]](https://github.com/huyanxin/DeepComplexCRN) 66 | 67 | * 2020, T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement, Kim. [[Paper]](https://ieeexplore.ieee.org/document/9053591) 68 | 69 | * 2020, Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net, Choi. [[Paper]](https://arxiv.org/abs/2006.00687) 70 | 71 | * 2021, DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement, [Le](https://github.com/Le-Xiaohuai-speech). [[Paper]](https://www.isca-speech.org/archive/pdfs/interspeech_2021/le21b_interspeech.pdf) [[DPCRN]](https://github.com/Le-Xiaohuai-speech/DPCRN_DNS3) 72 | 73 | * 2021, Real-time denoising and dereverberation with tiny recurrent u-net, Choi. [[Paper]](https://arxiv.org/pdf/2102.03207.pdf) 74 | 75 | * 2021, DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement, [Lv](https://github.com/IMYBo/) [[Paper]](https://arxiv.org/abs/2106.08672) 76 | 77 | * 2022, FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement, [Chen](https://github.com/hit-thusz-RookieCJ) [[Paper]](https://arxiv.org/abs/2203.12188) [[FullSubNet+]](https://github.com/hit-thusz-RookieCJ/FullSubNet-plus) 78 | 79 | * 2022, Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement, [Yu](https://github.com/yuguochencuc) [[Paper]](https://arxiv.org/abs/2110.06467) 80 | 81 | 82 | 83 | ### Time domain 84 | 85 | * 2018, Improved Speech Enhancement with the Wave-U-Net, Macartney. [[Paper]](https://arxiv.org/pdf/1811.11307.pdf) [[WaveUNet]](https://github.com/YosukeSugiura/Wave-U-Net-for-Speech-Enhancement-NNabla) 86 | * 2019, A New Framework for CNN-Based Speech Enhancement in the Time Domain, [Pandey](https://github.com/ashutosh620). [[Paper]](https://ieeexplore.ieee.org/document/8701652) 87 | * 2019, TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, [Pandey](https://github.com/ashutosh620). [[Paper]](https://ieeexplore.ieee.org/document/8683634) 88 | * 2020, Real Time Speech Enhancement in the Waveform Domain, Defossez. [[Paper]](https://arxiv.org/abs/2006.12847) [[facebookDenoiser]](https://github.com/facebookresearch/denoiser) 89 | * 2020, Monaural speech enhancement through deep wave-U-net, Guimarães. [[Paper]](https://www.sciencedirect.com/science/article/pii/S0957417420304061) [[SEWUNet]](https://github.com/Hguimaraes/SEWUNet) 90 | * 2020, Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis, Ali. [[Paper]](https://ieeexplore.ieee.org/document/9211072) 91 | * 2020, Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in the Time Domain, [Pandey](https://github.com/ashutosh620). [[Paper]](https://ashutosh620.github.io/files/DDAEC_ICASSP_2020.pdf) [[DDAEC]](https://github.com/ashutosh620/DDAEC) 92 | * 2021, Dense CNN With Self-Attention for Time-Domain Speech Enhancement, [Pandey](https://github.com/ashutosh620). [[Paper]](https://ieeexplore.ieee.org/document/9372863) 93 | * 2021, Dual-path Self-Attention RNN for Real-Time Speech Enhancement, [Pandey](https://github.com/ashutosh620). [[Paper]](https://arxiv.org/abs/2010.12713) 94 | * 2022, Speech Denoising in the Waveform Domain with Self-Attention, Kong. [[Paper]](https://arxiv.org/abs/2202.07790) 95 | 96 | ### Generative Model 97 | #### GAN 98 | * 2017, SEGAN: Speech Enhancement Generative Adversarial Network, Pascual. [[Paper]](https://arxiv.org/pdf/1703.09452.pdfsegan_pytorch) [[SEGAN]](https://github.com/santi-pdp/segan_pytorch) 99 | * 2019, SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, [Deepak Baby]((https://github.com/deepakbaby)). [[Paper]](https://biblio.ugent.be/publication/8613639/file/8646769.pdf) [[SERGAN]](https://github.com/deepakbaby/se_relativisticgan) 100 | * 2019, MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement, [Fu](https://github.com/JasonSWFu). [[Paper]](https://arxiv.org/pdf/1905.04874.pdf) [[MetricGAN]](https://github.com/JasonSWFu/MetricGAN) 101 | * 2019, MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement, [Fu](https://github.com/JasonSWFu). [[Paper]](https://arxiv.org/abs/2104.03538) [[MetricGAN+]](https://github.com/speechbrain/speechbrain/tree/develop/recipes/Voicebank/enhance/MetricGAN) 102 | * 2020, HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks, Su. [[Paper]](https://arxiv.org/abs/2006.05694) [[HifiGAN]](https://github.com/rishikksh20/hifigan-denoiser) 103 | * 2022, CMGAN: Conformer-Based Metric GAN for Monaural Speech Enhancement, [Abdulatif](https://github.com/SherifAbdulatif), [Cao](https://github.com/ruizhecao96) & Yang. [[Paper]](https://arxiv.org/abs/2209.11112) [[CMGAN]](https://github.com/ruizhecao96/CMGAN) 104 | 105 | #### Flow 106 | * 2021, A Flow-based Neural Network for Time Domain Speech Enhanceent, Strauss & [Edler](https://www.audiolabs-erlangen.de/fau/professor/edler). [[Paper]](https://arxiv.org/abs/2106.09008) 107 | 108 | #### VAE 109 | * 2018, A variance modeling framework based on variational autoencoders for speech enhancement, [[Leglaive]](https://gitlab.inria.fr/sileglai). [[Paper]](https://hal.inria.fr/hal-01832826v1/document) [[mlsp]](https://gitlab.inria.fr/sileglai/mlsp-2018) 110 | * 2020, Speech Enhancement with Stochastic Temporal Convolutional Networks, Richter. [[Paper]](https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2588.pdf) [[STCN-NMF]](https://github.com/sp-uhh/stcn-nmf) 111 | 112 | #### Diffusion Model 113 | * 2022, Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain, Welker. [[Paper]](https://www.isca-speech.org/archive/interspeech_2022/welker22_interspeech.html) [[SGMSE]](https://github.com/sp-uhh/sgmse) 114 | * 2022, StoRM: A Stochastic Regeneration Model for Speech Enhancement And Dereverberation, Lemercier. [[Paper]](https://arxiv.org/abs/2212.11851) [[StoRM]](https://github.com/sp-uhh/storm) 115 | * 2022, Conditional Diffusion Probabilistic Model for Speech Enhancement, Lu. [[Paper]](https://arxiv.org/abs/2202.05256) [[CDiffuSE]](https://github.com/neillu23/CDiffuSE) 116 | * 2023, Speech Enhancement and Dereverberation with Diffusion-Based Generative Models, Richter. [[Paper]](https://ieeexplore.ieee.org/abstract/document/10149431) [[SGMSE]](https://github.com/sp-uhh/sgmse) 117 | 118 | ### Hybrid SE 119 | * 2019, Deep Xi as a Front-End for Robust Automatic Speech Recognition, [Nicolson](https://github.com/anicolson). [[Paper]](https://arxiv.org/abs/1906.07319) [[DeepXi]](https://github.com/anicolson/DeepXi) 120 | * 2019, Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep-Learning-Based Speech Enhancement, [Li](https://github.com/LiChaiUSTC). [[Paper]](http://staff.ustc.edu.cn/~jundu/Publications/publications/chaili2019trans.pdf) [[SE-MLC]](https://github.com/LiChaiUSTC/Speech-enhancement-based-on-a-maximum-likelihood-criterion) 121 | * 2020, Deep Residual-Dense Lattice Network for Speech Enhancement, [Nikzad](https://github.com/nick-nikzad). [[Paper]](https://arxiv.org/pdf/2002.12794.pdf) [[RDL-SE]](https://github.com/nick-nikzad/RDL-SE) 122 | * 2020, DeepMMSE: A Deep Learning Approach to MMSE-based Noise Power Spectral Density Estimation, [Zhang](https://github.com/yunzqq). [[Paper]](https://ieeexplore.ieee.org/document/9066933) 123 | * 2020, Speech Enhancement Using a DNN-Augmented Colored-Noise Kalman Filter, [Yu](https://github.com/Hongjiang-Yu). [[Paper]](https://www.sciencedirect.com/science/article/pii/S0167639320302831) [[DNN-Kalman]](https://github.com/Hongjiang-Yu/DNN_Kalman_Filter) 124 | 125 | 131 | 132 | ### Decoupling-style 133 | * 2020, A Recursive Network with Dynamic Attention for Monaural Speech Enhancement, [Li](https://github.com/Andong-Li-speech). [[Paper]](https://arxiv.org/abs/2003.12973) [[DARCN]](https://github.com/Andong-Li-speech/DARCN) 134 | * 2020, Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise, [Hao](https://github.com/haoxiangsnr). [[Paper]](https://ieeexplore.ieee.org/document/9053188/) 135 | * 2020, A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement, Du. [[Paper]](https://ieeexplore.ieee.org/document/9082858) 136 | * 2020, Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression, [Westhausen](https://github.com/breizhn). [[Paper]](https://www.isca-speech.org/archive/Interspeech_2020/pdfs/2631.pdf) [[DTLN]](https://github.com/breizhn/DTLN) 137 | * 2020, Listening to Sounds of Silence for Speech Denoising, [Xu](https://github.com/henryxrl). [[Paper]](http://www.cs.columbia.edu/cg/listen_to_the_silence/paper.pdf) [[LSS]](https://github.com/henryxrl/Listening-to-Sound-of-Silence-for-Speech-Denoising) 138 | * 2021, ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network, [Li](https://github.com/Andong-Li-speech). [[Paper]](https://arxiv.org/abs/2102.04198) 139 | * 2022, Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement, [Li](https://github.com/Andong-Li-speech/GaGNet) [[Paper]](https://www.sciencedirect.com/science/article/pii/S0003682X21005934) 140 | * 2022, HGCN : harmonic gated compensation network for speech enhancement, [Wang](https://github.com/wangtianrui/HGCN). [[Paper]](https://arxiv.org/pdf/2201.12755.pdf) 141 | * 2022, Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation, [Fu](https://github.com/felixfuyihui). [[Paper]](https://arxiv.org/abs/2111.06015) [[Uformer]](https://github.com/felixfuyihui/Uformer) 142 | * 2022, DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio, [Schröter](https://github.com/Rikorose). [[Paper]](https://arxiv.org/abs/2205.05474) [[DeepFilterNet]](https://github.com/Rikorose/DeepFilterNet) 143 | * 2021, Multi-Task Audio Source Separation, [Zhang](https://github.com/Windstudent). [[Paper]](https://arxiv.org/abs/2107.06467) [[Code]](https://github.com/Windstudent/Complex-MTASSNet) 144 | 145 | ### Data collection 146 | * [Kashyap](https://arxiv.org/pdf/2104.03838.pdf)([[Noise2Noise]](https://github.com/madhavmk/Noise2Noise-audio_denoising_without_clean_training_data)) 147 | ### Loss 148 | * [[Quality-Net]](https://github.com/JasonSWFu/Quality-Net) 149 | ### Challenge 150 | * DNS Challenge [[DNS Interspeech2020]](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2020/) [[DNS ICASSP2021]](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2021/) [[DNS Interspeech2021]](https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/) 151 | 152 | ### Other repositories 153 | * Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement 154 | [[Link]](https://github.com/jonashaag/speech-enhancement) 155 | * nanahou's awesome speech enhancement [[Link]](https://github.com/nanahou/Awesome-Speech-Enhancement) 156 | 157 | ## Dereverberation 158 | ### Traditional method 159 | * SPENDRED [[Paper]](https://ieeexplore.ieee.org/document/7795155) 160 | [[SPENDRED]](https://github.com/csd111/dereverberation) 161 | * WPE(MCLP) [[Paper]](https://ieeexplore.ieee.org/document/6255769)[[nara-WPE]](https://github.com/fgnt/nara_wpe) 162 | * GWPE [[Code]](https://github.com/snsun/gwpe-speech-dereverb) 163 | * LP Residual [[Paper]](https://ieeexplore.ieee.org/abstract/document/1621193) [[LP_residual]](https://github.com/shamim-hussain/speech_dereverbaration_using_lp_residual) 164 | * dereverberate [[Paper]](https://www.aes.org/e-lib/browse.cfm?elib=15675) [[Code]](https://github.com/matangover/dereverberate) 165 | * NMF [[Paper]](https://ieeexplore.ieee.org/document/7471656/) [[NMF]](https://github.com/deepakbaby/dereverberation-and-denoising) 166 | ### Hybrid method 167 | * DNN_WPE [[Paper]](https://ieeexplore.ieee.org/document/7471656/) [[Code]](https://github.com/nttcslab-sp/dnn_wpe) 168 | ### NN-based Derev 169 | * Dereverberation-toolkit-for-REVERB-challenge [[Code]](https://github.com/hshi-speech/Dereverberation-toolkit-for-REVERB-challenge) 170 | * SkipConvNet [[Paper]](https://arxiv.org/pdf/2007.09131.pdf) [[Code]](https://github.com/zehuachenImperial/SkipConvNet) 171 | 172 | ## Speech Separation (single channel) 173 | * Tutorial speech separation, like awesome series [[Link]](https://github.com/gemengtju/Tutorial_Separation) 174 | ### NN-based separation 175 | * 2015, Deep-Clustering:Discriminative embeddings for segmentation and separation, Hershey and Chen.[[Paper]](https://arxiv.org/abs/1508.04306) 176 | [[Code]](https://github.com/JusperLee/Deep-Clustering-for-Speech-Separation) 177 | [[Code]](https://github.com/simonsuthers/Speech-Separation) 178 | [[Code]](https://github.com/funcwj/deep-clustering) 179 | * 2016, DANet:Deep Attractor Network (DANet) for single-channel speech separation, Chen.[[Paper]](https://arxiv.org/abs/1611.08930) 180 | [[Code]](https://github.com/naplab/DANet) 181 | * 2017, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent, Yu.[[Paper]](https://ai.tencent.com/ailab/media/publications/Multi-talker_Speech_Separation_with_Utterance-level.pdf) 182 | [[Code]](https://github.com/funcwj/uPIT-for-speech-separation) 183 | * 2018, LSTM_PIT_Speech_Separation 184 | [[Code]](https://github.com/pchao6/LSTM_PIT_Speech_Separation) 185 | * 2018, Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo.[[Paper]](https://arxiv.org/abs/1711.00541v2) 186 | [[Code]](https://github.com/mpariente/asteroid/blob/master/egs/whamr/TasNet) 187 | * 2019, Conv-TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation, Luo.[(Paper)](https://arxiv.org/pdf/1809.07454.pdf) 188 | [[Code]](https://github.com/kaituoxu/Conv-TasNet) 189 | * 2019, Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo.[[Paper]](https://arxiv.org/abs/1910.06379v1) 190 | [[Code1]](https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation) 191 | [[Code2]](https://github.com/JusperLee/Dual-Path-RNN-Pytorch) 192 | * 2019, TAC end-to-end microphone permutation and number invariant multi-channel speech separation, Luo.[[Paper]](https://arxiv.org/abs/1910.14104) 193 | [[Code]](https://github.com/yluo42/TAC) 194 | * 2020, Continuous Speech Separation with Conformer, Chen.[[Paper]](https://arxiv.org/abs/2008.05773) [[Code]](https://github.com/Sanyuan-Chen/CSS_with_Conformer) 195 | * 2020, Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation, Chen.[[Paper]](https://arxiv.org/abs/2007.13975) [[Code]](https://github.com/ujscjj/DPTNet) 196 | * 2020, Wavesplit: End-to-End Speech Separation by Speaker Clustering, Zeghidour.[[Paper]](https://arxiv.org/abs/2002.08933) 197 | * 2021, Attention is All You Need in Speech Separation, Subakan.[[Paper]](https://arxiv.org/abs/2010.13154) [[Code]](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WSJ0Mix/separation) 198 | * 2021, Ultra Fast Speech Separation Model with Teacher Student Learning, Chen.[[Paper]](https://www.isca-speech.org/archive/pdfs/interspeech_2021/chen21l_interspeech.pdf) 199 | * sound separation(Google) [[Code]](https://github.com/google-research/sound-separation) 200 | * sound separation: Deep learning based speech source separation using Pytorch [[Code]](https://github.com/AppleHolic/source_separation) 201 | * music-source-separation 202 | [[Code]](https://github.com/andabi/music-source-separation) 203 | * Singing-Voice-Separation 204 | [[Code]](https://github.com/Jeongseungwoo/Singing-Voice-Separation) 205 | * Comparison-of-Blind-Source-Separation-techniques[[Code]](https://github.com/TUIlmenauAMS/Comparison-of-Blind-Source-Separation-techniques) 206 | ### BSS/ICA method 207 | * FastICA[[Code]](https://github.com/ShubhamAgarwal1616/FastICA) 208 | * A localisation- and precedence-based binaural separation algorithm[[Download]](http://iosr.uk/software/downloads/PrecSep_toolbox.zip) 209 | * Convolutive Transfer Function Invariant SDR [[Code]](https://github.com/fgnt/ci_sdr) 210 | * 211 | ## Array Signal Processing 212 | * MASP:Microphone Array Speech Processing [[Code]](https://github.com/ZitengWang/MASP) 213 | * BeamformingSpeechEnhancer 214 | [[Code]](https://github.com/hkmogul/BeamformingSpeechEnhancer) 215 | * TSENet [[Code]](https://github.com/felixfuyihui/felixfuyihui.github.io) 216 | * steernet [[Code]](https://github.com/FrancoisGrondin/steernet) 217 | * DNN_Localization_And_Separation [[Code]](https://github.com/shaharhoch/DNN_Localization_And_Separation) 218 | * nn-gev:Neural network supported GEV beamformer CHiME3 [[Code]](https://github.com/fgnt/nn-gev) 219 | * chime4-nn-mask:Implementation of NN based mask estimator in pytorch(reuse some programming from nn-gev)[[Code]](https://github.com/funcwj/chime4-nn-mask) 220 | * beamformit_matlab:A MATLAB implementation of CHiME4 baseline Beamformit [[Code]](https://github.com/gogyzzz/beamformit_matlab) 221 | * pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [[Code]](https://github.com/fgnt/pb_chime5) 222 | * beamformit:麦克风阵列算法 [[Code]](https://github.com/592595/beamformit) 223 | * Beamforming-for-speech-enhancement [[Code]](https://github.com/AkojimaSLP/Beamforming-for-speech-enhancement) 224 | * deepBeam [[Code]](https://github.com/auspicious3000/deepbeam) 225 | * NN_MASK [[Code]](https://github.com/ZitengWang/nn_mask) 226 | * Cone-of-Silence [[Code]](https://github.com/vivjay30/Cone-of-Silence) 227 | 228 | ## Tools 229 | * APS:A workspace for single/multi-channel speech recognition & enhancement & separation. [[Code]](https://github.com/funcwj/aps) 230 | * AKtools:the open software toolbox for signal acquisition, processing, and inspection in acoustics [[SVN Code]](https://svn.ak.tu-berlin.de/svn/AKtools)(username: aktools; password: ak) 231 | * espnet [[Code]](https://github.com/espnet/espnet) 232 | * asteroid:The PyTorch-based audio source separation toolkit for researchers[[PDF]](https://arxiv.org/pdf/2005.04132.pdf)[[Code]](https://github.com/mpariente/asteroid) 233 | * pytorch_complex [[Code]](https://github.com/kamo-naoyuki/pytorch_complex) 234 | * ONSSEN: An Open-source Speech Separation and Enhancement Library 235 | [[Code]](https://github.com/speechLabBcCuny/onssen) 236 | * separation_data_preparation[[Code]](https://github.com/YongyuG/separation_data_preparation) 237 | * MatlabToolbox [[Code]](https://github.com/IoSR-Surrey/MatlabToolbox) 238 | * athena-signal [[Code]](https://github.com/athena-team/athena-signal) 239 | * python_speech_features [[Code]](https://github.com/jameslyons/python_speech_features) 240 | * speechFeatures [[Code]](https://github.com/SusannaWull/speechFeatures) 241 | * sap-voicebox [[Code]](https://github.com/ImperialCollegeLondon/sap-voicebox) 242 | * Calculate-SNR-SDR [[Code]](https://github.com/JusperLee/Calculate-SNR-SDR) 243 | * RIR-Generator [[Code]](https://github.com/ehabets/RIR-Generator) 244 | * Signal-Generator (for moving sources or a moving array) [[Code]](https://github.com/ehabets/Signal-Generator) 245 | * Python library for Room Impulse Response (RIR) simulation with GPU acceleration [[Code]](https://github.com/DavidDiazGuerra/gpuRIR) 246 | * ROOMSIM:binaural image source simulation [[Code]](https://github.com/Wenzhe-Liu/ROOMSIM) 247 | * binaural-image-source-model [[Code]](https://github.com/iCorv/binaural-image-source-model) 248 | * PESQ [[Code]](https://github.com/vBaiCai/python-pesq) 249 | * SETK: Speech Enhancement Tools integrated with Kaldi 250 | [[Code]](https://github.com/funcwj/setk) 251 | * pb_chime5:Speech enhancement system for the CHiME-5 dinner party scenario [[Code]](https://github.com/fgnt/pb_chime5) 252 | 253 | ## Books 254 | * P. C.Loizou: Speech Enhancement: Theory and Practice 255 | * J. Benesty, Y. Huang: Adaptive Signal Processing: Applications to Real-World Problems 256 | * S. Haykin: Adaptive Filter Theory 257 | * Eberhard Hansler, Gerhard Schmidt: Single-Channel Acoustic Echo Cancellation 和 Topics in Acoustic Echo and Noise Control 258 | * J. Benesty, S. Makino, J. Chen: Speech Enhancement 259 | * J. Benesty, M. M. Sondhi, Y. Huang: Handbook Of Speech Processing 260 | * Ivan J. Tashev: Sound Capture and Processing: Practical Approaches 261 | * I. Cohen, J. Benesty, S. Gannot: Speech Processing in Modern Communication 262 | * E. Vincent, T. Virtanen, S. Gannot: Audio Source Separation and Speech Enhancement 263 | * J. Benesty 等: A Perspective on Stereophonic Acoustic Echo Cancellation 264 | * J. Benesty 等: Advances in Network and Acoustic Echo Cancellation 265 | * T. F.Quatieri: Discrete-time speech signal processing: principles and practice 266 | * 宋知用: MATLAB在语音信号分析与合成中的应用 267 | * Harry L.Van Trees: Optimum Array Processing 268 | * 王永良: 空间谱估计理论与算法 269 | * 鄢社锋: 优化阵列信号处理 270 | * 张小飞: 阵列信号处理及matlab实现 271 | * 赵拥军: 宽带阵列信号波达方向估计理论与方法 272 | * [The-guidebook-of-speech-enhancement](https://github.com/WenzheLiu-Speech/The-guidebook-of-speech-enhancement) 273 | 274 | ## Resources 275 | * Sixty-years-of-frequency-domain-monaural-speech-enhancement [[se_overview]](https://github.com/cszheng-ioa/Sixty-years-of-frequency-domain-monaural-speech-enhancement) 276 | * Speech Signal Processing Course(ZH) [[Link]](https://github.com/veenveenveen/SpeechSignalProcessingCourse) 277 | * Speech Algorithms(ZH) [[Link]](https://github.com/Ryuk17/SpeechAlgorithms) 278 | * Speech Resources[[Link]](https://github.com/ddlBoJack/Speech-Resources) 279 | * Sound capture and speech enhancement for speech-enabled devices [[Link]](https://www.microsoft.com/en-us/research/uploads/prod/2022/01/Sound-capture-and-speech-enhancement-for-speech-enabled-devices-ASA-181.pdf) 280 | * CCF语音对话与听觉专业组语音对话与听觉前沿研讨会(ZH) [[Link]](https://www.bilibili.com/video/BV1MV411k7iJ) 281 | ----------------------------------------------------------------------- 282 | * binauralLocalization 283 | [[Code]](https://github.com/nicolasobin/binauralLocalization) 284 | * robotaudition_examples:Some Robot Audition simplified examples (sound source localization and separation), coded in Octave/Matlab [[Code]](https://github.com/balkce/robotaudition_examples) 285 | * WSCM-MUSIC 286 | [[Code]](https://github.com/xuchenglin28/WSCM-MUSIC) 287 | * doa-tools 288 | [[Code]](https://github.com/morriswmz/doa-tools) 289 | * Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks 290 | [[Code]](https://github.com/RoyJames/doa-release) [[PDF]](https://arxiv.org/pdf/1904.08452v3.pdf) 291 | * messl:Model-based EM Source Separation and Localization 292 | [[Code]](https://github.com/mim/messl) 293 | * messlJsalt15:MESSL wrappers etc for JSALT 2015, including CHiME3 [[Code]](https://github.com/speechLabBcCuny/messlJsalt15) 294 | * fast_sound_source_localization_using_TLSSC:Fast Sound Source Localization Using Two-Level Search Space Clustering 295 | [[Code]](https://github.com/LeeTaewoo/fast_sound_source_localization_using_TLSSC) 296 | * Binaural-Auditory-Localization-System 297 | [[Code]](https://github.com/r04942117/Binaural-Auditory-Localization-System) 298 | * Binaural_Localization:ITD-based localization of sound sources in complex acoustic environments [[Code]](https://github.com/Hardcorehobel/Binaural_Localization) 299 | * Dual_Channel_Beamformer_and_Postfilter [[Code]](https://github.com/XiaoxiangGao/Dual_Channel_Beamformer_and_Postfilter) 300 | * 麦克风声源定位 [[Code]](https://github.com/xiaoli1368/Microphone-sound-source-localization) 301 | * RTF-based-LCMV-GSC [[Code]](https://github.com/Tungluai/RTF-based-LCMV-GSC) 302 | * DOA [[Code]](https://github.com/wangwei2009/DOA) 303 | 304 | 305 | ## Sound Event Detection 306 | * sed_eval - Evaluation toolbox for Sound Event Detection 307 | [[Code]](https://github.com/TUT-ARG/sed_eval) 308 | * Benchmark for sound event localization task of DCASE 2019 challenge 309 | [[Code]](https://github.com/sharathadavanne/seld-dcase2019) 310 | * sed-crnn DCASE 2017 real-life sound event detection winning method. 311 | [[Code]](https://github.com/sharathadavanne/sed-crnn) 312 | * seld-net 313 | [[Code]](https://github.com/sharathadavanne/seld-net) 314 | 315 | 316 | -------------------------------------------------------------------------------- /overview.svg: -------------------------------------------------------------------------------- 1 | DNN-IBM-SVM(Wang, 2013)DNN-IBM(Wang, 2014)DNN-IRM(Wang, 2014)RNN-PSM(Erdogan, 2015)DNN-LPS(Xu, 2014)DNN-cIRM(Williamson, 2016)LSTM-IRM(Chen, 2017)LSTM-LPS(Sun, 2017)CMNN-IRM(Hui, 2015)CNN-LPS(Fu, 2016)CED-Mag(Park, 2016)EHNet-Mag(Zhao, 2018)CRN-Mag(Tan, 2018)GRN-Mag(Tan, 2018)Spectrum SubtractionMMSEDeepXi(Nicolson,2019)DeepMMSE(Zhang,2020)PL-ANSE(Nian,2021)RNNoise(Valin, 2018)TCN-FAA(Zhang,2021)PercepNet(Valin, 2020)DeepFilterNet(Schröter, 2022)NSNet-cprs(Braun, 2020)CNN-STFT(Fu, 2017)GCRN-STFT(Tan, 2019)DCUNet-pcRM(Choi, 2019)DCCRN-pcRM(Hu, 2020)Phase ReconstructionPHASEN-MP(Yin, 2020)LSTM-MP(Wang, 2019)DNN-MP(Zheng, 2019)CRUSE-cprs(Braun, 2021)Progressive LearningDNN-PL(Gao, 2016)DTLN(Westhausen, 2020)CTSNet&SDDNet(Li, 2021)GaGNet(Li, 2022)UFormer(Fu, 2022)HGCN(Wang, 2022)Two-stage(Strake, 2019)Mask and inpaint(Hao, 2020)SI-NE-NR(Xu, 2020)NE-SR(Liu, 2021)TasNet(Luo, 2018)Wave-U-Net(Macartney, 2018)AECNN(Pandey, 2018)Conv-TasNet(Luo, 2019)TCNN(Pandey, 2019)DEMUCS(Défossez, 2020)DPRNN(Luo, 2020)DCN(Pandey, 2021)Hybrid ModelMagnitude SpecDecouple-styleComplex SpecTime-domainFrom Mag to OtherIBM/IRM/PSM/cIRM: masksLPS/Mag/STFT: log power spec/magnitude spec/complex speccprs: compressed specMP: mag and phase estimationpcRM: polar coordinate-wise complex maskPL: progressive learning 2 | --------------------------------------------------------------------------------