└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # awesome-speech 2 | this is a treasure-house of speech 3 | 4 | ## 目录 5 | * [语音识别(ASR,STT)](#1) 6 | * [page](#1.1) 7 | * [open source library/toolbox/code](#1.2) 8 | * [corpus/dataset](#1.3) 9 | * [教程(Tutorial)](#1.4) 10 | * [语音合成(Speech Synthesis,TTS)](#2) 11 | * [page](#2.1) 12 | * [open source library/toolbox/code](#2.2) 13 | * [corpus/dataset](#2.3) 14 | * [教程(Tutorial)](#2.4) 15 | * [声纹识别(Speaker Recognition)](#3) 16 | * [page](#3.1) 17 | * [open source library/toolbox/code](#3.2) 18 | * [corpus/dataset](#3.3) 19 | * [教程(Tutorial)](#3.4) 20 | * [对话系统(Dialogue Systems)](#4) 21 | * [page](#4.1) 22 | * [open source library/toolbox/code](#4.2) 23 | * [corpus/dataset](#4.3) 24 | * [教程(Tutorial)](#4.4) 25 | * [前端(front end)](#5) 26 | * [Speech Processing](#5.1) 27 | * [Audio I/O](#5.2) 28 | * [Sound Source Separation](#5.3) 29 | * [Feature Extraction](#5.4) 30 | * [VAD](#5.5) 31 | * [resource](#6) 32 | * [code/tool/data](#6.1) 33 | * [Tutorial](#6.2) 34 | * [paper](#6.3) 35 | * [pages](#7) 36 | 37 | ##

语音识别

38 | ###

page

39 | #### Xingyu Na 40 | * http://naxingyu.github.io/ 41 | * https://github.com/naxingyu?tab=repositories 42 | #### Language Processing and Pattern Recognition in University of Aachen 43 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html 44 | #### Fernando de la Calle Silos 45 | * http://www.tsc.uc3m.es/~fsilos/ 46 | * https://github.com/fernandodelacalle?tab=repositories 47 | 48 | ###

open source library/toolbox/code

49 | #### HTK 50 | * http://htk.eng.cam.ac.uk/download.shtml 51 | #### Py2HTK 52 | * https://github.com/g-leech/Py2HTK 53 | #### parallel-htk 54 | * https://github.com/jpuigcerver/parallel-htk 55 | #### HTK_C_MATLAB_tools 56 | * https://github.com/sinb/HTK_C_MATLAB_tools 57 | 58 | #### Kaldi: 59 | * https://github.com/kaldi-asr/kaldi 60 | #### Kaldi官方文档(中文版) 61 | * http://blog.geekidentity.com/asr/kaldi/kaldi_tutorial/ 62 | #### Kaldi models 63 | * http://kaldi-asr.org/models.html 64 | #### Corpus Phonetics Tutorial 65 | * https://www.eleanorchodroff.com/tutorial/kaldi/kaldi-intro.html 66 | #### py-kaldi-asr 67 | * https://github.com/pykaldi/pykaldi 68 | * https://github.com/gooofy/py-kaldi-asr 69 | * https://github.com/UFAL-DSG/pykaldi 70 | * https://github.com/janchorowski/kaldi-python 71 | #### Dan's DNN implementation: 72 | * http://kaldi-asr.org/doc/dnn2.html 73 | #### pytorch-kaldi 74 | * https://github.com/mravanelli/pytorch-kaldi/ 75 | #### kaldi-lstm 76 | * https://github.com/dophist/kaldi-lstm 77 | #### kaldi-ctc 78 | * https://github.com/lingochamp/kaldi-ctc 79 | #### keras-kaldi 80 | * https://github.com/dspavankumar/keras-kaldi 81 | #### python wrapper for kaldi-online-decoder 82 | * https://github.com/funcwj/pydecoder 83 | #### Kaldi+PDNN 84 | * https://github.com/yajiemiao/kaldipdnn 85 | #### tfkaldi 86 | * https://github.com/vrenkens/tfkaldi 87 | #### Kaldi_CNTK_AMI 88 | * https://github.com/chenguoguo/Kaldi_CNTK_AMI 89 | #### kaldi-io-for-python 90 | * https://github.com/vesis84/kaldi-io-for-python 91 | #### kaldi-pyio 92 | * https://github.com/funcwj/kaldi-pyio 93 | #### kaldi-tree-conv 94 | * https://github.com/dophist/kaldi-tree-conv 95 | #### kaldi-ivector 96 | * https://github.com/idiap/kaldi-ivector 97 | #### kaldi-yesno-tutorial 98 | * https://github.com/keighrim/kaldi-yesno-tutorial 99 | #### Kaldi nnet3 教程 100 | * https://gist.github.com/candlewill/f6c789059bf28b99cee8e18b99c20bfd 101 | #### Josh Meyer's Website 102 | * http://jrmeyer.github.io/ 103 | #### Adapting your own Language Model for Kaldi 104 | * https://github.com/srvk/lm_build 105 | #### Some Kaldi Notes 106 | * http://jrmeyer.github.io/asr/2016/02/01/Kaldi-notes.html 107 | * http://sentiment-mining.blogspot.com/ 108 | * http://pages.jh.edu/~echodro1/tutorial/kaldi/ 109 | #### kaldi_tutorial 110 | * https://github.com/hyung8758/kaldi_tutorial 111 | #### Online decoder for Kaldi NNET2 and GMM speech recognition models with Python bindings 112 | * https://github.com/UFAL-DSG/alex-asr 113 | #### ResNet-Kaldi-Tensorflow-ASR 114 | * https://github.com/fernandodelacalle/ResNet-Kaldi-Tensorflow-ASR 115 | #### Kaldi ASR: Extending the ASpIRE model 116 | * https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/ 117 | #### FastCGI support for Kaldi ASR 118 | * https://github.com/dialogflow/asr-server 119 | #### alignUsingKaldi 120 | * https://github.com/Sundy1219/alignUsingKaldi 121 | #### kaldi-readers-for-tensorflow 122 | * https://github.com/t13m/kaldi-readers-for-tensorflow 123 | #### kaldi-iot 124 | * https://github.com/dophist/kaldi-iot 125 | #### lattice-info 126 | * https://github.com/jpuigcerver/lattice-info 127 | #### lattice-char-to-word 128 | * https://github.com/jpuigcerver/lattice-char-to-word 129 | #### lattice-word-length-distribution 130 | * https://github.com/jpuigcerver/lattice-word-length-distribution 131 | #### kaldi-lattice-word-index 132 | * https://github.com/jpuigcerver/kaldi-lattice-word-index 133 | #### kaldi-decoders 134 | * https://github.com/jpuigcerver/kaldi-decoders 135 | #### lattice-remove-ctc-blank 136 | * https://github.com/jpuigcerver/lattice-remove-ctc-blank 137 | #### kaldi-lattice-search 138 | * https://github.com/jpuigcerver/kaldi-lattice-search 139 | #### htk2kaldi 140 | * https://github.com/jpuigcerver/htk2kaldi 141 | #### parallel-kaldi 142 | * https://github.com/jpuigcerver/parallel-kaldi 143 | #### kaldi 在线中文识别系统搭建 144 | * https://blog.csdn.net/shichaog/article/details/73655628 145 | #### kaldi-docker 146 | * https://github.com/golbin/kaldi-docker 147 | #### CSLT-Sparse-DNN-Toolkit 148 | * https://github.com/wyq730/CSLT-Sparse-DNN-Toolkit 149 | #### featxtra 150 | * https://github.com/mvansegbroeck/featxtra 151 | #### Sphinx 152 | * https://cmusphinx.github.io/ 153 | * https://github.com/cmusphinx 154 | * https://github.com/cmusphinx/pocketsphinx 155 | #### OpenFst 156 | * http://www.openfst.org/twiki/bin/view/FST/WebHome 157 | * https://github.com/UFAL-DSG/openfst 158 | * https://github.com/benob/openfst-utils 159 | * https://github.com/vchahun/pyfst 160 | #### MIT Spoken Language Systems 161 | * https://groups.csail.mit.edu/sls/downloads/ 162 | #### Julius 163 | * http://julius.osdn.jp/en_index.php 164 | * https://github.com/julius-speech/julius 165 | #### Bavieca 166 | * http://www.bavieca.org/ 167 | #### Simon 168 | * https://simon.kde.org/ 169 | #### SIDEKIT 170 | * http://www-lium.univ-lemans.fr/sidekit/ 171 | #### SRILM 172 | * https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit 173 | * http://www.speech.sri.com/projects/srilm/ 174 | * https://github.com/nuance1979/srilm-python 175 | * https://github.com/njsmith/pysrilm 176 | #### awd-lstm-lm 177 | * https://github.com/salesforce/awd-lstm-lm 178 | #### ISIP 179 | * https://www.isip.piconepress.com/projects/speech/ 180 | 181 | #### MIT Finite-State Transducer (FST) Toolkit 182 | * http://groups.csail.mit.edu/sls/downloads/ 183 | #### MIT Language Modeling (MITLM) Toolkit 184 | * http://groups.csail.mit.edu/sls/downloads/ 185 | #### OpenGrm 186 | * http://www.openfst.org/twiki/bin/view/GRM/WebHome 187 | #### RNNLM 188 | * http://www.fit.vutbr.cz/~imikolov/rnnlm/ 189 | * https://github.com/IntelLabs/rnnlm 190 | * https://github.com/glecorve/rnnlm2wfst 191 | #### faster-rnnlm 192 | * https://github.com/yandex/faster-rnnlm 193 | #### CUED-RNNLM Toolkit 194 | * http://mi.eng.cam.ac.uk/projects/cued-rnnlm/ 195 | #### Using RNNLM rescoring a sentence in Chinese ASR system 196 | * https://github.com/Sundy1219/RNNLM 197 | #### KenLM 198 | * https://github.com/kpu/kenlm 199 | * https://kheafield.com/code/kenlm/ 200 | #### rwthlm 201 | * https://www-i6.informatik.rwth-aachen.de/web/Software/rwthlm.php 202 | #### word-rnn-tensorflow 203 | * https://github.com/hunkim/word-rnn-tensorflow 204 | #### tensorlm 205 | * https://github.com/batzner/tensorlm 206 | #### SpeechRecognition 207 | * https://github.com/Uberi/speech_recognition 208 | #### SpeechPy 209 | * https://github.com/astorfi/speechpy 210 | #### Aalto 211 | * https://github.com/aalto-speech/AaltoASR 212 | #### google-cloud-speech 213 | * https://pypi.org/project/google-cloud-speech/ 214 | #### apiai 215 | https://pypi.org/project/apiai/ 216 | #### wit 217 | * https://github.com/wit-ai/pywit 218 | #### Nabu 219 | * https://github.com/vrenkens/nabu 220 | #### asr-study 221 | * https://github.com/igormq/asr-study 222 | #### dejavu 223 | * https://github.com/worldveil/dejavu 224 | #### uSpeech 225 | * https://github.com/arjo129/uSpeech 226 | #### Juicer 227 | * https://github.com/idiap/juicer 228 | #### PMLS 229 | * http://pmls.readthedocs.io/en/latest/dnn-speech.html 230 | #### dragonfly 231 | * https://github.com/t4ngo/dragonfly 232 | #### SPTK 233 | * https://github.com/r9y9/SPTK 234 | * https://github.com/sp-nitech/SPTK 235 | * http://sp-tk.sourceforge.net/ 236 | #### pysptk 237 | * https://github.com/r9y9/pysptk 238 | #### RWTH ASR 239 | * https://www-i6.informatik.rwth-aachen.de/rwth-asr/ 240 | #### Palaver 241 | * https://github.com/JamezQ/Palaver 242 | #### Praat 243 | * http://www.fon.hum.uva.nl/praat/ 244 | * https://github.com/kylebgorman/textgrid 245 | #### Speech Recognition Grammar Specification 246 | * https://www.w3.org/TR/speech-grammar/ 247 | #### Automatic_Speech_Recognition 248 | * https://github.com/zzw922cn/Automatic_Speech_Recognition 249 | #### speech-to-text-wavenet 250 | * https://github.com/buriburisuri/speech-to-text-wavenet 251 | #### tensorflow-speech-recognition 252 | * https://github.com/pannous/tensorflow-speech-recognition 253 | #### tensorflow_end2end_speech_recognition 254 | * https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition 255 | #### tensorflow_speech_recognition_demo 256 | * https://github.com/llSourcell/tensorflow_speech_recognition_demo 257 | #### AVSR-Deep-Speech 258 | * https://github.com/pandeydivesh15/AVSR-Deep-Speech 259 | #### TTS and ASR 260 | * https://github.com/roboticslab-uc3m/speech 261 | #### CTC + Tensorflow Example for ASR 262 | * https://github.com/igormq/ctc_tensorflow_example 263 | #### tensorflow-ctc-speech-recognition 264 | * https://github.com/philipperemy/tensorflow-ctc-speech-recognition 265 | #### speechT 266 | * https://github.com/timediv/speechT 267 | #### end2endASR 268 | * https://github.com/cdyangbo/end2endASR 269 | #### NADU 270 | * https://github.com/vrenkens/nabu 271 | #### DTW (Dynamic Time Warping) python module 272 | * https://github.com/pierre-rouanet/dtw 273 | #### Various scripts and tools for speech recognition model building 274 | * https://github.com/gooofy/speech 275 | #### 基于深度学习的语音识别系统,使用CNN、LSTM和CTC实现的中文语音识别系统 276 | * https://github.com/nl8590687/ASRT_SpeechRecognition 277 | #### tacotron_asr 278 | * https://github.com/Kyubyong/tacotron_asr 279 | #### ASR_Keras 280 | * https://github.com/Chriskamphuis/ASR 281 | #### Kaggle Tensorflow Speech Recognition Challenge 282 | * https://dinantdatascientist.blogspot.dk/2018/02/kaggle-tensorflow-speech-recognition.html 283 | #### Speech recognition script for Asterisk that uses google's speech engine 284 | * https://github.com/zaf/asterisk-speech-recog 285 | #### Libraries and scripts for manipulating and handling ASR output/n-bests/etc 286 | * https://github.com/belambert/asr-tools 287 | #### Some scripts and commands for working with ASR 288 | * https://github.com/JRMeyer/asr 289 | #### PySpeechGrammar 290 | * https://github.com/ynop/pyspeechgrammar 291 | #### Python module for evaluating ASR hypotheses 292 | * https://github.com/belambert/asr-evaluation 293 | #### edit-distance 294 | * https://github.com/belambert/edit-distance 295 | 296 | 297 | ###

dataset

298 | #### VoxForge 299 | * http://www.voxforge.org/home 300 | * http://www.voxforge.org/zh 301 | * http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/ 302 | #### ASR Audio Data Links 303 | * https://github.com/robmsmt/ASR_Audio_Data_Links 304 | #### The CMU Pronouncing Dictionary 305 | * http://www.speech.cs.cmu.edu/cgi-bin/cmudict 306 | #### TIMIT 307 | * https://catalog.ldc.upenn.edu/LDC93S1 308 | * https://github.com/syhw/timit_tools 309 | * https://github.com/philipperemy/timit 310 | #### GlobalPhone Language Models 311 | * http://www.csl.uni-bremen.de/GlobalPhone/ 312 | #### 1 Billion Word Language Model Benchmark 313 | * https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark 314 | * http://www.statmt.org/lm-benchmark/ 315 | #### DaCiDian-Develop 316 | * https://github.com/dophist/DaCiDian-Develop 317 | #### CC-CEDICT 318 | * https://www.mdbg.net/chinese/dictionary?page=cc-cedict 319 | #### TED-LIUM 320 | * https://lium.univ-lemans.fr/ted-lium3/ 321 | #### open-asr-lexicon 322 | * https://github.com/dophist/open-asr-lexicon 323 | 324 | ###

Tutorial

325 | #### University of Edinburgh ASR2017-18 326 | * http://www.inf.ed.ac.uk/teaching/courses/asr/ 327 | #### stanford CS224s 328 | * https://web.stanford.edu/class/cs224s/syllabus.html 329 | #### NYU asr12 330 | * https://cs.nyu.edu/~mohri/asr12/ 331 | #### Speech Recognition with Neural Networks 332 | * http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/ 333 | 334 | 335 | ##

语音合成

336 | ###

page

337 | #### CSTR-Edinburgh 338 | * https://github.com/CSTR-Edinburgh 339 | 340 | ###

open source library/toolbox

341 | #### WORLD 342 | * https://github.com/mmorise/World 343 | #### HTS 344 | * http://hts.sp.nitech.ac.jp/ 345 | * http://hts-engine.sourceforge.net/ 346 | * https://github.com/shamidreza/HTS-demo_CMU-ARCTIC-SLT-Formant 347 | * https://github.com/MattShannon/HTS-demo_CMU-ARCTIC-SLT-STRAIGHT-AR-decision-tree 348 | #### Tacotron 349 | * https://github.com/Kyubyong/tacotron 350 | * https://github.com/Kyubyong/expressive_tacotron 351 | * https://github.com/keithito/tacotron 352 | * https://github.com/GSByeon/multi-speaker-tacotron-tensorflow 353 | * https://github.com/r9y9/tacotron_pytorch 354 | * https://github.com/soobinseo/Tacotron-pytorch 355 | #### Tacotron2 356 | * https://github.com/NVIDIA/tacotron2 357 | * https://github.com/riverphoenix/tacotron2 358 | * https://github.com/A-Jacobson/tacotron2 359 | * https://github.com/selap91/Tacotron2 360 | * https://github.com/LGizkde/Tacotron2_Tao_Shujie 361 | * https://github.com/rlawns1016/Tacotron2 362 | * https://github.com/CapstoneInha/Tacotron2-rehearsal 363 | #### Merlin 364 | * https://github.com/CSTR-Edinburgh/merlin 365 | #### mozilla TTS 366 | * https://github.com/mozilla/TTS 367 | #### Flite 368 | * http://www.speech.cs.cmu.edu/flite/ 369 | * https://github.com/festvox/flite 370 | #### Speect 371 | * http://speect.sourceforge.net/ 372 | #### Festival 373 | * https://github.com/festvox/festival 374 | #### eSpeak 375 | * http://espeak.sourceforge.net/ 376 | * https://github.com/gooofy/py-espeak-ng 377 | #### nnmnkwii 378 | * https://github.com/r9y9/nnmnkwii 379 | #### Ossian 380 | * https://github.com/CSTR-Edinburgh/Ossian 381 | #### gTTS 382 | * https://github.com/pndurette/gTTS 383 | #### gnuspeech 384 | * http://git.savannah.gnu.org/cgit/gnuspeech.git 385 | #### supercollider 386 | * https://github.com/supercollider/supercollider 387 | #### sc3-plugins 388 | * https://github.com/supercollider/sc3-plugins 389 | #### Neural_Network_Voices 390 | * https://github.com/llSourcell/Neural_Network_Voices 391 | #### pggan-pytorch 392 | * https://github.com/deepsound-project/pggan-pytorch 393 | #### cainteoir-engine 394 | * https://github.com/rhdunn/cainteoir-engine 395 | #### loop 396 | * https://github.com/facebookresearch/loop 397 | #### nnmnkwii 398 | * https://github.com/r9y9/nnmnkwii 399 | #### TTS and ASR 400 | * https://github.com/roboticslab-uc3m/speech 401 | #### musa_tts 402 | * https://github.com/santi-pdp/musa_tts 403 | #### marytts(JAVA) 404 | * https://github.com/marytts/marytts 405 | 406 | 407 | ##

声纹识别

408 | ###

open source library/toolbox

409 | #### Alize 410 | * http://mistral.univ-avignon.fr/ 411 | #### speaker-recognition-py3 412 | * https://github.com/crouchred/speaker-recognition-py3 413 | #### openVP 414 | * https://github.com/dake/openVP 415 | * https://github.com/swshon?tab=repositories 416 | ### Gender recognition by voice and speech analysis 417 | * http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/ 418 | * https://github.com/primaryobjects/voice-gender 419 | 420 | ##

对话系统

421 | ###

pages

422 | #### NTU 423 | * http://miulab.tw/ 424 | * https://github.com/MiuLab 425 | * https://www.csie.ntu.edu.tw/~yvchen/publication.html 426 | #### Tsung-Hsien Wen 427 | * https://shawnwun.github.io/ 428 | 429 | ###

open source library/toolbox

430 | #### PyDial 431 | * http://www.camdial.org/pydial/ 432 | #### alex 433 | * https://github.com/UFAL-DSG/alex 434 | #### ROS 语音交互系统 435 | * https://github.com/hntea/ros-speech 436 | #### 结合ROS框架的中文语音交互系统 437 | * https://github.com/hntea/speech-system-zh 438 | 439 | ##

前端

440 | ###

Speech Processing

441 | #### madmom 442 | * https://github.com/CPJKU/madmom 443 | #### pydub 444 | * https://github.com/jiaaro/pydub 445 | #### kapre: Keras Audio Preprocessors 446 | * https://github.com/keunwoochoi/kapre 447 | #### BTK 448 | * http://distantspeechrecognition.sourceforge.net/ 449 | #### EspNet 450 | * https://github.com/espnet/espnet 451 | #### Signal-Processing 452 | * https://github.com/mathEnthusaistCodes/Signal-Processing 453 | #### pyroomacoustics 454 | * https://github.com/LCAV/pyroomacoustics 455 | #### librosa 456 | * https://github.com/librosa/librosa 457 | * https://github.com/librosa/librosa_gallery 458 | #### REAPER 459 | * https://github.com/google/REAPER 460 | #### MSD_split_for_tagging 461 | * https://github.com/keunwoochoi/MSD_split_for_tagging 462 | #### VOICEBOX 463 | * http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 464 | #### liquid-dsp 465 | * https://github.com/jgaeddert/liquid-dsp 466 | #### ffts 467 | * https://github.com/anthonix/ffts 468 | #### mir_eval 469 | * https://github.com/craffel/mir_eval 470 | #### aupyom 471 | * https://github.com/pierre-rouanet/aupyom 472 | #### Pitch Detection 473 | * http://note.sonots.com/SciSoftware/Pitch.html 474 | #### TFTB 475 | * http://tftb.nongnu.org/ 476 | #### maracas 477 | * https://github.com/jfsantos/maracas 478 | #### SRMRpy 479 | * https://github.com/jfsantos/SRMRpy 480 | #### ssp 481 | * https://github.com/idiap/ssp 482 | * https://github.com/idiap/libssp 483 | #### iss 484 | * https://github.com/idiap/iss 485 | * https://github.com/idiap/iss-dicts 486 | #### asr_preprocessing 487 | * https://github.com/hirofumi0810/asr_preprocessing 488 | #### asrt 489 | * https://github.com/idiap/asrt 490 | #### Audio super resolution using NN 491 | * https://github.com/kuleshov/audio-super-res 492 | #### RNN training for noise reduction in robust asr 493 | * https://github.com/amaas/rnn-speech-denoising 494 | #### RNN for audio noise reduction 495 | * https://github.com/xiph/rnnoise 496 | #### muda 497 | * https://github.com/bmcfee/muda 498 | #### Efficient sample rate conversion in python 499 | * https://github.com/bmcfee/resampy 500 | #### Smarc audio rate converter 501 | * http://audio-smarc.sourceforge.net/ 502 | #### Python scripts to computes f0s of a wave file 503 | * https://github.com/t13m/pyPitchCom 504 | 505 | ###

Audio I/O

506 | #### PortAudio 507 | * http://www.portaudio.com/ 508 | #### audiolab 509 | * https://github.com/cournape/audiolab 510 | #### pytorch audio 511 | * https://github.com/pytorch/audio 512 | #### Digital Speech Decoder 513 | * https://github.com/szechyjs/dsd 514 | #### audioread 515 | * https://github.com/beetbox/audioread 516 | #### audacity.py 517 | * https://github.com/davidavdav/audacity.py 518 | 519 | ###

Sound Source Separation

520 | #### HARK 521 | * https://www.hark.jp/wiki.cgi?page=HARK+Installation+Instructions 522 | #### Deep RNN for Source Separation 523 | * https://github.com/posenhuang/deeplearningsourceseparation 524 | #### nussl 525 | * https://github.com/interactiveaudiolab/nussl 526 | #### DNN for Music Source Separation in Tensorflow 527 | * https://andabi.github.io/music-source-separation/ 528 | #### Alexey Ozerov 529 | * http://www.irisa.fr/metiss/ozerov/ 530 | #### University of Surrey CVSSP 531 | * https://github.com/CVSSP 532 | #### source separation using CNN 533 | * https://github.com/emma-mens/ASR 534 | 535 | ###

Feature Extraction

536 | #### openSMILE 537 | * https://audeering.com/technology/opensmile/ 538 | * https://github.com/naxingyu/opensmile 539 | #### veles.sound_feature_extraction 540 | * https://github.com/Samsung/veles.sound_feature_extraction 541 | #### vamp-plugin-sdk 542 | * https://github.com/c4dm/vamp-plugin-sdk 543 | #### Yaafe 544 | * http://yaafe.sourceforge.net/ 545 | #### py_bank 546 | * https://github.com/wil-j-wil/py_bank 547 | #### AuditoryFilterbanks 548 | * https://github.com/jfsantos/AuditoryFilterbanks 549 | #### python_speech_features 550 | * https://github.com/jameslyons/python_speech_features 551 | 552 | ###

VAD

553 | * https://github.com/jtkim-kaist/VAD 554 | * https://github.com/jtkim-kaist/VAD_DNN 555 | * https://github.com/marsbroshok/VAD-python 556 | * https://github.com/shiweixingcn/vad 557 | * https://github.com/fedden/RenderMan 558 | #### rVAD 559 | * http://kom.aau.dk/~zt/online/readme.htm 560 | #### Aurora 2 VAD 561 | * http://kom.aau.dk/~zt/online/readme.htm 562 | #### IsraelCohen 563 | * http://webee.technion.ac.il/people/IsraelCohen/Info/Software.html 564 | #### Python interface to the WebRTC Voice Activity Detector 565 | * https://github.com/wiseman/py-webrtcvad 566 | 567 | 568 | ##

资源

569 | ###

code/tool/data

570 | ### cmusphinx 571 | * https://github.com/cmusphinx 572 | #### julius-speech 573 | * https://github.com/julius-speech 574 | #### OpenSLR 575 | * http://www.openslr.org/ 576 | #### List of speech recognition software 577 | * https://en.wikipedia.org/wiki/List_of_speech_recognition_software 578 | #### KTH 579 | * http://www.speech.kth.se/software/ 580 | #### VERBIO 581 | * http://www.verbio.com/webverbiotm/html/productes.php?id=2 582 | #### timeview 583 | * https://github.com/lxkain/timeview 584 | #### Speech at CMU Web Page 585 | * http://www.speech.cs.cmu.edu/ 586 | #### CMU Robust Speech Group 587 | * http://www.cs.cmu.edu/~robust/code.html 588 | #### Speech Software at CMU 589 | * http://www.speech.cs.cmu.edu/hephaestus.html 590 | #### Aalto Speech Research 591 | * https://github.com/aalto-speech 592 | #### CMU Festvox Project 593 | * https://github.com/festvox?tab=repositories 594 | * http://www.festvox.org/ 595 | #### CSTR 596 | * http://www.cstr.ed.ac.uk/research/ 597 | * http://www.cstr.ed.ac.uk/downloads/ 598 | #### Xiph 599 | * https://github.com/xiph 600 | #### Brno University of Technology Speech Processing Group 601 | * http://speech.fit.vutbr.cz/software 602 | #### SoX 603 | * http://sox.sourceforge.net/ 604 | #### STRAIGHT 605 | * https://github.com/shuaijiang/STRAIGHT 606 | * http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html 607 | #### Idiap Research Institute 608 | * https://github.com/idiap 609 | #### Transcriber 610 | * http://trans.sourceforge.net/en/presentation.php 611 | #### Amirsina Torfi 612 | * https://github.com/astorfi?tab=repositories 613 | #### The Speech Recognition Virtual Kitchen 614 | * https://github.com/srvk 615 | * http://www.clsp.jhu.edu/~sriram/software/soft.html 616 | #### Sparse Representation & Dictionary Learning Algorithms with Applications in Denoising, Separation, Localisation and Tracking 617 | * http://personal.ee.surrey.ac.uk/Personal/W.Wang/codes.html 618 | #### Audacity 619 | * https://www.audacityteam.org/ 620 | #### beetbox 621 | * https://github.com/beetbox 622 | #### CAQE 623 | * https://github.com/interactiveaudiolab/CAQE 624 | #### UCL Speech Filing System 625 | * http://www.phon.ucl.ac.uk/resource/sfs/ 626 | #### Ryuichi Yamamoto 627 | * https://github.com/r9y9?tab=repositories 628 | #### Kyubyong Park 629 | * https://github.com/Kyubyong?tab=repositories 630 | #### Hideyuki Tachibana 631 | * https://github.com/tachi-hi?tab=repositories 632 | #### Colin Raffel 633 | * https://github.com/craffel?tab=repositories 634 | #### Paul Dixon 635 | * https://github.com/edobashira?tab=repositories 636 | #### smacpy 637 | * https://github.com/danstowell/smacpy 638 | #### c4dm 639 | * http://c4dm.eecs.qmul.ac.uk/software_data.html 640 | #### Matt Shannon 641 | * https://github.com/MattShannon?tab=repositories 642 | #### Keunwoo Choi 643 | * https://github.com/keunwoochoi?tab=repositories 644 | #### ADASP 645 | * http://www.tsi.telecom-paristech.fr/aao/en/software-and-database/ 646 | #### uchicago Speech and Language @ TTIC 647 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm 648 | #### justin salamon 649 | * http://www.justinsalamon.com/codedata.html 650 | #### COLEA 651 | * http://ecs.utdallas.edu/loizou/speech/colea.htm 652 | #### openAUDIO 653 | * http://www.openaudio.eu/ 654 | #### Praat 655 | * http://www.fon.hum.uva.nl/praat/ 656 | * https://github.com/timmahrt/praatIO 657 | #### librosa 658 | * https://github.com/librosa 659 | #### Essentia 660 | * https://github.com/MTG/essentia 661 | #### timmahrt 662 | * https://github.com/timmahrt?tab=repositories 663 | #### Lefteris Zafiris 664 | * https://github.com/zaf?tab=repositories 665 | #### audio-to-audio and audio-to-midi alignment 666 | * https://github.com/cataska/scorealign 667 | #### DNN based hotword and wake word detection toolkit 668 | * https://github.com/Kitt-AI/snowboy 669 | #### free-spoken-digit-dataset 670 | * https://github.com/Jakobovski/free-spoken-digit-dataset 671 | #### 中文语言资源联盟 672 | * http://www.chineseldc.org/resource_list.php?begin=0&count=20 673 | #### Institute of Formal and Applied Linguistics – Dialogue Systems Group 674 | * https://github.com/UFAL-DSG 675 | 676 | * https://github.com/edobashira/speech-language-processing 677 | * https://github.com/andabi?tab=repositories 678 | * https://code.soundsoftware.ac.uk/projects 679 | 680 | ###

tutorial

681 | #### DL for Computer Vision, Speech, and Language 682 | * http://llcao.net/cu-deeplearning17/resource.html 683 | #### 臺大數位語音處理概論 684 | * http://speech.ee.ntu.edu.tw/courses.html 685 | * http://ocw.aca.ntu.edu.tw/ntu-ocw/ocw/cou/104S204 686 | #### IISc Speech Information Processing 687 | * http://www.ee.iisc.ac.in/new/people/faculty/prasantg/e9261_speech_jan2018.html 688 | * http://www.practicalcryptography.com/miscellaneous/machine-learning/ 689 | 690 | ###

paper

691 | * https://arxiv.org/search/?query=speech&searchtype=all&source=header 692 | * https://www.isca-speech.org/iscaweb/index.php/archive/online-archive 693 | * https://www.aclweb.org/anthology/ 694 | * https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers 695 | #### states of the arts and recent results (bibliography) on speech recognition 696 | * https://github.com/syhw/wer_are_we 697 | 698 | 699 | 700 | ##

主页

701 | #### Dan Povey 702 | * http://www.danielpovey.com/publications.html 703 | #### cmusphinx 704 | * https://github.com/cmusphinx 705 | #### CMU Language Technologies Institute 706 | * https://www.lti.cs.cmu.edu/work 707 | #### CMU SPEECH@SV 708 | * http://speech.sv.cmu.edu/publications.html 709 | #### Mitsubishi Electric Research Laboratorie 710 | * http://www.merl.com/publications/ 711 | #### MIT Spoken Language Systems 712 | * https://groups.csail.mit.edu/sls/downloads/ 713 | #### Brno University of Technology Speech Processing Group 714 | * http://speech.fit.vutbr.cz/software 715 | #### IISc 716 | * https://spire.ee.iisc.ac.in/spire/allPublications.php 717 | #### uchicago Speech and Language @ TTIC 718 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm 719 | #### RWTH Aachen University 720 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html 721 | #### TOKUDA and NANKAKU LABORATORY 722 | * http://www.sp.nitech.ac.jp/index.php?HOME%2FSOFTWARE 723 | #### Institute of Formal and Applied Linguistics – Dialogue Systems Group 724 | * https://github.com/UFAL-DSG 725 | #### Ohio State University speech separation 726 | * http://web.cse.ohio-state.edu/pnl/software.html 727 | #### LEAP Laboratory 728 | * http://www.leap.ee.iisc.ac.in/publications/ 729 | #### Hainan Xu 730 | * https://www.cs.jhu.edu/~hxu/ 731 | #### Mark Gales 732 | * http://mi.eng.cam.ac.uk/~mjfg/ 733 | #### Karen Livescu 734 | * http://ttic.uchicago.edu/~klivescu/ 735 | #### Shubham Toshniwal 736 | * https://github.com/shtoshni92?tab=repositories 737 | #### Adrien Ycart 738 | * http://www.eecs.qmul.ac.uk/~ay304/code.html 739 | #### Ron Weiss 740 | * https://ronw.github.io// 741 | #### Yajie Miao 742 | * https://www.cs.cmu.edu/~ymiao/ 743 | #### Scott T Wisdom 744 | * https://sites.google.com/site/scottwisdomhomepage/publications 745 | #### Alan W Black 746 | * https://www.cs.cmu.edu/~awb/ 747 | #### Amirsina Torfi 748 | * https://www.amirsinatorfi.com/publications 749 | #### Liang Lu 750 | * http://ttic.uchicago.edu/~llu/ 751 | #### Zhizheng WU 752 | * http://www.zhizheng.org/ 753 | #### justin salamon 754 | * http://www.justinsalamon.com/codedata.html 755 | #### Karen Livescu 756 | * http://ttic.uchicago.edu/~klivescu/ 757 | #### Shubham Toshniwal 758 | * http://ttic.uchicago.edu/~shtoshni/#pubs 759 | * https://github.com/shtoshni92?tab=repositories 760 | #### Keith Vertanen 761 | * http://www.keithv.com/software/ 762 | #### Aviv Gabbay 763 | * http://www.cs.huji.ac.il/~avivga/ 764 | #### Mehryar Mohri 765 | * https://cs.nyu.edu/~mohri/ 766 | #### Jonathan LE ROUX 767 | * http://www.jonathanleroux.org/ 768 | #### Suyoun Kim 769 | * https://synetkim.github.io/ 770 | #### DeepSound 771 | * http://deepsound.io/ 772 | #### Lei Xie 773 | * http://lxie.npu-aslp.org/ 774 | --------------------------------------------------------------------------------