└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # awesome-speech
  2 | this is a treasure-house of speech
  3 | 
  4 | ## 目录
  5 | * [语音识别（ASR,STT）](#1)
  6 |   * [page](#1.1)
  7 |   * [open source library/toolbox/code](#1.2)
  8 |   * [corpus/dataset](#1.3)
  9 |   * [教程（Tutorial）](#1.4)
 10 | * [语音合成（Speech Synthesis,TTS）](#2)
 11 |   * [page](#2.1)
 12 |   * [open source library/toolbox/code](#2.2)
 13 |   * [corpus/dataset](#2.3)
 14 |   * [教程（Tutorial）](#2.4)
 15 | * [声纹识别（Speaker Recognition）](#3)
 16 |   * [page](#3.1)
 17 |   * [open source library/toolbox/code](#3.2)
 18 |   * [corpus/dataset](#3.3)
 19 |   * [教程（Tutorial）](#3.4)
 20 | * [对话系统（Dialogue Systems）](#4)
 21 |   * [page](#4.1)
 22 |   * [open source library/toolbox/code](#4.2)
 23 |   * [corpus/dataset](#4.3)
 24 |   * [教程（Tutorial）](#4.4)
 25 | * [前端（front end）](#5)
 26 |   * [Speech Processing](#5.1)
 27 |   * [Audio I/O](#5.2)
 28 |   * [Sound Source Separation](#5.3)
 29 |   * [Feature Extraction](#5.4)
 30 |   * [VAD](#5.5)
 31 | * [resource](#6)
 32 |   * [code/tool/data](#6.1)
 33 |   * [Tutorial](#6.2)
 34 |   * [paper](#6.3)
 35 | * [pages](#7)
 36 | 
 37 | ## <h2 id="1">语音识别</h2>
 38 |  ### <h3 id="1.1">page</h3>
 39 |  #### Xingyu Na
 40 | * http://naxingyu.github.io/
 41 | * https://github.com/naxingyu?tab=repositories
 42 |  #### Language Processing and Pattern Recognition in University of Aachen
 43 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html
 44 |  #### Fernando de la Calle Silos
 45 | * http://www.tsc.uc3m.es/~fsilos/
 46 | * https://github.com/fernandodelacalle?tab=repositories
 47 | 
 48 |  ### <h3 id="1.2">open source library/toolbox/code</h3>
 49 |  #### HTK
 50 | * http://htk.eng.cam.ac.uk/download.shtml
 51 |  #### Py2HTK
 52 | * https://github.com/g-leech/Py2HTK
 53 |  #### parallel-htk
 54 | * https://github.com/jpuigcerver/parallel-htk
 55 |  #### HTK_C_MATLAB_tools
 56 | * https://github.com/sinb/HTK_C_MATLAB_tools
 57 | 
 58 |  #### Kaldi:
 59 | * https://github.com/kaldi-asr/kaldi
 60 |  #### Kaldi官方文档（中文版）
 61 | * http://blog.geekidentity.com/asr/kaldi/kaldi_tutorial/
 62 |  #### Kaldi models
 63 | * http://kaldi-asr.org/models.html
 64 |  #### Corpus Phonetics Tutorial
 65 | * https://www.eleanorchodroff.com/tutorial/kaldi/kaldi-intro.html
 66 |  #### py-kaldi-asr
 67 | * https://github.com/pykaldi/pykaldi
 68 | * https://github.com/gooofy/py-kaldi-asr
 69 | * https://github.com/UFAL-DSG/pykaldi
 70 | * https://github.com/janchorowski/kaldi-python
 71 |  #### Dan's DNN implementation:
 72 | * http://kaldi-asr.org/doc/dnn2.html
 73 |  #### pytorch-kaldi
 74 | * https://github.com/mravanelli/pytorch-kaldi/
 75 |  #### kaldi-lstm
 76 | * https://github.com/dophist/kaldi-lstm
 77 |  #### kaldi-ctc
 78 | * https://github.com/lingochamp/kaldi-ctc
 79 |  #### keras-kaldi
 80 | * https://github.com/dspavankumar/keras-kaldi
 81 |  #### python wrapper for kaldi-online-decoder
 82 | * https://github.com/funcwj/pydecoder
 83 |  #### Kaldi+PDNN
 84 | * https://github.com/yajiemiao/kaldipdnn
 85 |  #### tfkaldi
 86 | * https://github.com/vrenkens/tfkaldi
 87 |  #### Kaldi_CNTK_AMI
 88 | * https://github.com/chenguoguo/Kaldi_CNTK_AMI
 89 |  #### kaldi-io-for-python
 90 | * https://github.com/vesis84/kaldi-io-for-python
 91 |  #### kaldi-pyio
 92 | * https://github.com/funcwj/kaldi-pyio
 93 |  #### kaldi-tree-conv
 94 | * https://github.com/dophist/kaldi-tree-conv
 95 |  #### kaldi-ivector
 96 | * https://github.com/idiap/kaldi-ivector
 97 |  #### kaldi-yesno-tutorial
 98 | * https://github.com/keighrim/kaldi-yesno-tutorial
 99 |  #### Kaldi nnet3 教程
100 | * https://gist.github.com/candlewill/f6c789059bf28b99cee8e18b99c20bfd
101 |  #### Josh Meyer's Website
102 | * http://jrmeyer.github.io/
103 |  #### Adapting your own Language Model for Kaldi
104 | * https://github.com/srvk/lm_build
105 |  #### Some Kaldi Notes
106 | * http://jrmeyer.github.io/asr/2016/02/01/Kaldi-notes.html
107 | * http://sentiment-mining.blogspot.com/
108 | * http://pages.jh.edu/~echodro1/tutorial/kaldi/
109 |  #### kaldi_tutorial
110 | * https://github.com/hyung8758/kaldi_tutorial
111 |  #### Online decoder for Kaldi NNET2 and GMM speech recognition models with Python bindings
112 | * https://github.com/UFAL-DSG/alex-asr
113 |  #### ResNet-Kaldi-Tensorflow-ASR
114 | * https://github.com/fernandodelacalle/ResNet-Kaldi-Tensorflow-ASR
115 |  #### Kaldi ASR: Extending the ASpIRE model
116 | * https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/
117 |  #### FastCGI support for Kaldi ASR
118 | * https://github.com/dialogflow/asr-server
119 |  #### alignUsingKaldi
120 | * https://github.com/Sundy1219/alignUsingKaldi
121 |  #### kaldi-readers-for-tensorflow
122 | * https://github.com/t13m/kaldi-readers-for-tensorflow
123 |  #### kaldi-iot
124 | * https://github.com/dophist/kaldi-iot
125 |  #### lattice-info
126 | * https://github.com/jpuigcerver/lattice-info
127 |  #### lattice-char-to-word
128 | * https://github.com/jpuigcerver/lattice-char-to-word
129 |  #### lattice-word-length-distribution
130 | * https://github.com/jpuigcerver/lattice-word-length-distribution
131 |  #### kaldi-lattice-word-index
132 | * https://github.com/jpuigcerver/kaldi-lattice-word-index
133 |  #### kaldi-decoders
134 | * https://github.com/jpuigcerver/kaldi-decoders
135 |  #### lattice-remove-ctc-blank
136 | * https://github.com/jpuigcerver/lattice-remove-ctc-blank
137 |  #### kaldi-lattice-search
138 | * https://github.com/jpuigcerver/kaldi-lattice-search
139 |  #### htk2kaldi
140 | * https://github.com/jpuigcerver/htk2kaldi
141 |  #### parallel-kaldi
142 | * https://github.com/jpuigcerver/parallel-kaldi
143 |  #### kaldi 在线中文识别系统搭建
144 | * https://blog.csdn.net/shichaog/article/details/73655628
145 |  #### kaldi-docker
146 | * https://github.com/golbin/kaldi-docker
147 |  #### CSLT-Sparse-DNN-Toolkit
148 | * https://github.com/wyq730/CSLT-Sparse-DNN-Toolkit
149 |  #### featxtra
150 | * https://github.com/mvansegbroeck/featxtra
151 |  #### Sphinx
152 | * https://cmusphinx.github.io/
153 | * https://github.com/cmusphinx
154 | * https://github.com/cmusphinx/pocketsphinx
155 |  #### OpenFst
156 | * http://www.openfst.org/twiki/bin/view/FST/WebHome
157 | * https://github.com/UFAL-DSG/openfst
158 | * https://github.com/benob/openfst-utils
159 | * https://github.com/vchahun/pyfst
160 |  #### MIT Spoken Language Systems
161 | * https://groups.csail.mit.edu/sls/downloads/
162 |  #### Julius
163 | * http://julius.osdn.jp/en_index.php
164 | * https://github.com/julius-speech/julius
165 |  #### Bavieca
166 | * http://www.bavieca.org/
167 |  #### Simon 
168 | * https://simon.kde.org/
169 |  #### SIDEKIT
170 | * http://www-lium.univ-lemans.fr/sidekit/
171 |  #### SRILM
172 | * https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit
173 | * http://www.speech.sri.com/projects/srilm/
174 | * https://github.com/nuance1979/srilm-python
175 | * https://github.com/njsmith/pysrilm
176 |  #### awd-lstm-lm
177 | * https://github.com/salesforce/awd-lstm-lm
178 |  #### ISIP
179 | * https://www.isip.piconepress.com/projects/speech/
180 | 
181 |  #### MIT Finite-State Transducer (FST) Toolkit
182 | * http://groups.csail.mit.edu/sls/downloads/
183 |  #### MIT Language Modeling (MITLM) Toolkit
184 | * http://groups.csail.mit.edu/sls/downloads/
185 |  #### OpenGrm
186 | * http://www.openfst.org/twiki/bin/view/GRM/WebHome
187 |  #### RNNLM
188 | * http://www.fit.vutbr.cz/~imikolov/rnnlm/
189 | * https://github.com/IntelLabs/rnnlm
190 | * https://github.com/glecorve/rnnlm2wfst
191 |  #### faster-rnnlm
192 | * https://github.com/yandex/faster-rnnlm
193 |  #### CUED-RNNLM Toolkit
194 | * http://mi.eng.cam.ac.uk/projects/cued-rnnlm/
195 |  #### Using RNNLM rescoring a sentence in Chinese ASR system
196 | * https://github.com/Sundy1219/RNNLM
197 |  #### KenLM
198 | * https://github.com/kpu/kenlm
199 | * https://kheafield.com/code/kenlm/
200 |  #### rwthlm
201 | * https://www-i6.informatik.rwth-aachen.de/web/Software/rwthlm.php
202 |  #### word-rnn-tensorflow
203 | * https://github.com/hunkim/word-rnn-tensorflow
204 |  #### tensorlm
205 | * https://github.com/batzner/tensorlm
206 |  #### SpeechRecognition
207 | * https://github.com/Uberi/speech_recognition
208 |  #### SpeechPy
209 | * https://github.com/astorfi/speechpy
210 |  #### Aalto
211 | * https://github.com/aalto-speech/AaltoASR
212 |  #### google-cloud-speech
213 | * https://pypi.org/project/google-cloud-speech/
214 |  #### apiai
215 | https://pypi.org/project/apiai/
216 |  #### wit
217 | * https://github.com/wit-ai/pywit
218 |  #### Nabu
219 | * https://github.com/vrenkens/nabu
220 |  #### asr-study
221 | * https://github.com/igormq/asr-study
222 |  #### dejavu
223 | * https://github.com/worldveil/dejavu
224 |  #### uSpeech
225 | * https://github.com/arjo129/uSpeech
226 |  #### Juicer
227 | * https://github.com/idiap/juicer
228 |  #### PMLS
229 | * http://pmls.readthedocs.io/en/latest/dnn-speech.html
230 |  #### dragonfly
231 | * https://github.com/t4ngo/dragonfly
232 |  #### SPTK
233 | * https://github.com/r9y9/SPTK
234 | * https://github.com/sp-nitech/SPTK
235 | * http://sp-tk.sourceforge.net/
236 |  #### pysptk
237 | * https://github.com/r9y9/pysptk
238 |  #### RWTH ASR
239 | * https://www-i6.informatik.rwth-aachen.de/rwth-asr/
240 |  #### Palaver
241 | * https://github.com/JamezQ/Palaver
242 |  #### Praat
243 | * http://www.fon.hum.uva.nl/praat/
244 | * https://github.com/kylebgorman/textgrid
245 |  #### Speech Recognition Grammar Specification
246 | * https://www.w3.org/TR/speech-grammar/
247 |  #### Automatic_Speech_Recognition
248 | * https://github.com/zzw922cn/Automatic_Speech_Recognition
249 |  #### speech-to-text-wavenet
250 | * https://github.com/buriburisuri/speech-to-text-wavenet
251 |  #### tensorflow-speech-recognition
252 | * https://github.com/pannous/tensorflow-speech-recognition
253 |  #### tensorflow_end2end_speech_recognition
254 | * https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
255 |  #### tensorflow_speech_recognition_demo
256 | * https://github.com/llSourcell/tensorflow_speech_recognition_demo
257 |  #### AVSR-Deep-Speech
258 | * https://github.com/pandeydivesh15/AVSR-Deep-Speech
259 |  #### TTS and ASR
260 | * https://github.com/roboticslab-uc3m/speech
261 |  #### CTC + Tensorflow Example for ASR
262 | * https://github.com/igormq/ctc_tensorflow_example
263 |  #### tensorflow-ctc-speech-recognition
264 | * https://github.com/philipperemy/tensorflow-ctc-speech-recognition
265 |  #### speechT
266 | * https://github.com/timediv/speechT
267 |  #### end2endASR
268 | * https://github.com/cdyangbo/end2endASR
269 |  #### NADU
270 | * https://github.com/vrenkens/nabu
271 |  #### DTW (Dynamic Time Warping) python module
272 | * https://github.com/pierre-rouanet/dtw
273 |  #### Various scripts and tools for speech recognition model building
274 | * https://github.com/gooofy/speech
275 |  #### 基于深度学习的语音识别系统，使用CNN、LSTM和CTC实现的中文语音识别系统
276 | * https://github.com/nl8590687/ASRT_SpeechRecognition
277 |  #### tacotron_asr
278 | * https://github.com/Kyubyong/tacotron_asr
279 |  #### ASR_Keras
280 | * https://github.com/Chriskamphuis/ASR
281 |  #### Kaggle Tensorflow Speech Recognition Challenge
282 | * https://dinantdatascientist.blogspot.dk/2018/02/kaggle-tensorflow-speech-recognition.html
283 |  #### Speech recognition script for Asterisk that uses google's speech engine
284 | * https://github.com/zaf/asterisk-speech-recog
285 |  #### Libraries and scripts for manipulating and handling ASR output/n-bests/etc
286 | * https://github.com/belambert/asr-tools
287 |  #### Some scripts and commands for working with ASR
288 | * https://github.com/JRMeyer/asr
289 |  #### PySpeechGrammar
290 | * https://github.com/ynop/pyspeechgrammar
291 |  #### Python module for evaluating ASR hypotheses
292 | * https://github.com/belambert/asr-evaluation
293 |  #### edit-distance
294 | * https://github.com/belambert/edit-distance
295 | 
296 | 
297 |  ### <h3 id="1.3">dataset</h3>
298 |  #### VoxForge
299 | * http://www.voxforge.org/home
300 | * http://www.voxforge.org/zh
301 | * http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/
302 |  #### ASR Audio Data Links
303 | * https://github.com/robmsmt/ASR_Audio_Data_Links
304 |  #### The CMU Pronouncing Dictionary
305 | * http://www.speech.cs.cmu.edu/cgi-bin/cmudict
306 |  #### TIMIT
307 | * https://catalog.ldc.upenn.edu/LDC93S1
308 | * https://github.com/syhw/timit_tools
309 | * https://github.com/philipperemy/timit
310 |  #### GlobalPhone Language Models
311 | * http://www.csl.uni-bremen.de/GlobalPhone/
312 |  #### 1 Billion Word Language Model Benchmark
313 | * https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark
314 | * http://www.statmt.org/lm-benchmark/
315 |  #### DaCiDian-Develop
316 | * https://github.com/dophist/DaCiDian-Develop
317 |  #### CC-CEDICT
318 | * https://www.mdbg.net/chinese/dictionary?page=cc-cedict
319 |  #### TED-LIUM
320 | * https://lium.univ-lemans.fr/ted-lium3/
321 |  #### open-asr-lexicon
322 | * https://github.com/dophist/open-asr-lexicon
323 | 
324 |  ### <h3 id="1.4">Tutorial</h3>
325 |  #### University of Edinburgh ASR2017-18
326 | * http://www.inf.ed.ac.uk/teaching/courses/asr/
327 |  #### stanford CS224s
328 | * https://web.stanford.edu/class/cs224s/syllabus.html
329 |  #### NYU asr12
330 | * https://cs.nyu.edu/~mohri/asr12/
331 |  #### Speech Recognition with Neural Networks
332 | * http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/
333 | 
334 | 
335 | ## <h2 id="2">语音合成</h2>
336 |  ### <h3 id="2.1">page</h3>
337 |  #### CSTR-Edinburgh
338 | * https://github.com/CSTR-Edinburgh
339 | 
340 |  ### <h3 id="2.2">open source library/toolbox</h3>
341 |  #### WORLD
342 | * https://github.com/mmorise/World
343 |  #### HTS
344 | * http://hts.sp.nitech.ac.jp/
345 | * http://hts-engine.sourceforge.net/
346 | * https://github.com/shamidreza/HTS-demo_CMU-ARCTIC-SLT-Formant
347 | * https://github.com/MattShannon/HTS-demo_CMU-ARCTIC-SLT-STRAIGHT-AR-decision-tree
348 |  #### Tacotron
349 | * https://github.com/Kyubyong/tacotron
350 | * https://github.com/Kyubyong/expressive_tacotron
351 | * https://github.com/keithito/tacotron
352 | * https://github.com/GSByeon/multi-speaker-tacotron-tensorflow
353 | * https://github.com/r9y9/tacotron_pytorch
354 | * https://github.com/soobinseo/Tacotron-pytorch
355 |  #### Tacotron2
356 | * https://github.com/NVIDIA/tacotron2
357 | * https://github.com/riverphoenix/tacotron2
358 | * https://github.com/A-Jacobson/tacotron2
359 | * https://github.com/selap91/Tacotron2
360 | * https://github.com/LGizkde/Tacotron2_Tao_Shujie
361 | * https://github.com/rlawns1016/Tacotron2
362 | * https://github.com/CapstoneInha/Tacotron2-rehearsal
363 |  #### Merlin
364 | * https://github.com/CSTR-Edinburgh/merlin
365 |  #### mozilla TTS
366 | * https://github.com/mozilla/TTS
367 |  #### Flite
368 | * http://www.speech.cs.cmu.edu/flite/
369 | * https://github.com/festvox/flite
370 |  #### Speect
371 | * http://speect.sourceforge.net/
372 |  #### Festival
373 | * https://github.com/festvox/festival
374 |  #### eSpeak
375 | * http://espeak.sourceforge.net/
376 | * https://github.com/gooofy/py-espeak-ng
377 |  #### nnmnkwii
378 | * https://github.com/r9y9/nnmnkwii
379 |  #### Ossian
380 | * https://github.com/CSTR-Edinburgh/Ossian
381 |  #### gTTS
382 | * https://github.com/pndurette/gTTS
383 |  #### gnuspeech
384 | * http://git.savannah.gnu.org/cgit/gnuspeech.git
385 |  #### supercollider
386 | * https://github.com/supercollider/supercollider
387 |  #### sc3-plugins
388 | * https://github.com/supercollider/sc3-plugins
389 |  #### Neural_Network_Voices
390 | * https://github.com/llSourcell/Neural_Network_Voices
391 |  #### pggan-pytorch
392 | * https://github.com/deepsound-project/pggan-pytorch
393 |  #### cainteoir-engine
394 | * https://github.com/rhdunn/cainteoir-engine
395 |  #### loop
396 | * https://github.com/facebookresearch/loop
397 |  #### nnmnkwii
398 | * https://github.com/r9y9/nnmnkwii
399 |  #### TTS and ASR
400 | * https://github.com/roboticslab-uc3m/speech
401 |  #### musa_tts
402 | * https://github.com/santi-pdp/musa_tts
403 |  #### marytts(JAVA)
404 | * https://github.com/marytts/marytts
405 | 
406 | 
407 | ## <h2 id="3">声纹识别</h2>
408 |  ### <h3 id="3.2">open source library/toolbox</h3>
409 |  #### Alize
410 | * http://mistral.univ-avignon.fr/
411 |  #### speaker-recognition-py3
412 | * https://github.com/crouchred/speaker-recognition-py3
413 |  #### openVP
414 | * https://github.com/dake/openVP
415 | * https://github.com/swshon?tab=repositories
416 |  ### Gender recognition by voice and speech analysis
417 | * http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/
418 | * https://github.com/primaryobjects/voice-gender
419 | 
420 | ## <h2 id="4">对话系统</h2>
421 |  ### <h3 id="4.1">pages</h3>
422 |  #### NTU
423 | * http://miulab.tw/
424 | * https://github.com/MiuLab
425 | * https://www.csie.ntu.edu.tw/~yvchen/publication.html
426 |  #### Tsung-Hsien Wen
427 | * https://shawnwun.github.io/
428 |  
429 |  ### <h3 id="4.2">open source library/toolbox</h3>
430 |  #### PyDial
431 | * http://www.camdial.org/pydial/
432 |  #### alex
433 | * https://github.com/UFAL-DSG/alex
434 |  #### ROS 语音交互系统
435 | * https://github.com/hntea/ros-speech
436 |  #### 结合ROS框架的中文语音交互系统 
437 | * https://github.com/hntea/speech-system-zh
438 | 
439 | ## <h2 id="5">前端</h2>
440 |  ### <h3 id="5.1">Speech Processing</h3>
441 |  #### madmom
442 | * https://github.com/CPJKU/madmom
443 |  #### pydub
444 | * https://github.com/jiaaro/pydub
445 |  #### kapre: Keras Audio Preprocessors
446 | * https://github.com/keunwoochoi/kapre
447 |  #### BTK
448 | * http://distantspeechrecognition.sourceforge.net/
449 |  #### EspNet
450 | * https://github.com/espnet/espnet
451 |  #### Signal-Processing
452 | * https://github.com/mathEnthusaistCodes/Signal-Processing
453 |  #### pyroomacoustics
454 | * https://github.com/LCAV/pyroomacoustics
455 |  #### librosa
456 | * https://github.com/librosa/librosa
457 | * https://github.com/librosa/librosa_gallery
458 |  #### REAPER
459 | * https://github.com/google/REAPER
460 |  #### MSD_split_for_tagging
461 | * https://github.com/keunwoochoi/MSD_split_for_tagging
462 |  #### VOICEBOX
463 | * http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
464 |  #### liquid-dsp
465 | * https://github.com/jgaeddert/liquid-dsp
466 |  #### ffts
467 | * https://github.com/anthonix/ffts
468 |  #### mir_eval
469 | * https://github.com/craffel/mir_eval
470 |  #### aupyom
471 | * https://github.com/pierre-rouanet/aupyom
472 |  #### Pitch Detection
473 | * http://note.sonots.com/SciSoftware/Pitch.html
474 |  #### TFTB
475 | * http://tftb.nongnu.org/
476 |  #### maracas
477 | * https://github.com/jfsantos/maracas
478 |  #### SRMRpy
479 | * https://github.com/jfsantos/SRMRpy
480 |  #### ssp
481 | * https://github.com/idiap/ssp
482 | * https://github.com/idiap/libssp
483 |  #### iss
484 | * https://github.com/idiap/iss
485 | * https://github.com/idiap/iss-dicts
486 |  #### asr_preprocessing
487 | * https://github.com/hirofumi0810/asr_preprocessing
488 |  #### asrt
489 | * https://github.com/idiap/asrt
490 |  #### Audio super resolution using NN
491 | * https://github.com/kuleshov/audio-super-res
492 |  #### RNN training for noise reduction in robust asr
493 | * https://github.com/amaas/rnn-speech-denoising
494 |  #### RNN for audio noise reduction
495 | * https://github.com/xiph/rnnoise
496 |  #### muda
497 | * https://github.com/bmcfee/muda
498 |  #### Efficient sample rate conversion in python
499 | * https://github.com/bmcfee/resampy
500 |  #### Smarc audio rate converter
501 | * http://audio-smarc.sourceforge.net/
502 |  #### Python scripts to computes f0s of a wave file
503 | * https://github.com/t13m/pyPitchCom
504 | 
505 |  ### <h3 id="5.2">Audio I/O</h3>
506 |  #### PortAudio
507 | * http://www.portaudio.com/
508 |  #### audiolab
509 | * https://github.com/cournape/audiolab
510 |  #### pytorch audio
511 | * https://github.com/pytorch/audio
512 |  #### Digital Speech Decoder
513 | * https://github.com/szechyjs/dsd
514 |  #### audioread
515 | * https://github.com/beetbox/audioread
516 |  #### audacity.py
517 | * https://github.com/davidavdav/audacity.py
518 | 
519 |  ### <h3 id="5.3">Sound Source Separation</h3>
520 |  #### HARK
521 | * https://www.hark.jp/wiki.cgi?page=HARK+Installation+Instructions
522 |  #### Deep RNN for Source Separation
523 | * https://github.com/posenhuang/deeplearningsourceseparation
524 |  #### nussl
525 | * https://github.com/interactiveaudiolab/nussl
526 |  #### DNN for Music Source Separation in Tensorflow
527 | * https://andabi.github.io/music-source-separation/
528 |  #### Alexey Ozerov
529 | * http://www.irisa.fr/metiss/ozerov/
530 |  #### University of Surrey CVSSP
531 | * https://github.com/CVSSP
532 |  #### source separation using CNN
533 | * https://github.com/emma-mens/ASR
534 | 
535 |  ### <h3 id="5.4">Feature Extraction</h3>
536 |  #### openSMILE
537 | * https://audeering.com/technology/opensmile/
538 | * https://github.com/naxingyu/opensmile
539 |  #### veles.sound_feature_extraction
540 | * https://github.com/Samsung/veles.sound_feature_extraction
541 |  #### vamp-plugin-sdk
542 | * https://github.com/c4dm/vamp-plugin-sdk
543 |  #### Yaafe
544 | * http://yaafe.sourceforge.net/
545 |  #### py_bank
546 | * https://github.com/wil-j-wil/py_bank
547 |  #### AuditoryFilterbanks
548 | * https://github.com/jfsantos/AuditoryFilterbanks
549 |  #### python_speech_features
550 | * https://github.com/jameslyons/python_speech_features
551 | 
552 |  ### <h3 id="5.5">VAD</h3>
553 | * https://github.com/jtkim-kaist/VAD
554 | * https://github.com/jtkim-kaist/VAD_DNN
555 | * https://github.com/marsbroshok/VAD-python
556 | * https://github.com/shiweixingcn/vad
557 | * https://github.com/fedden/RenderMan
558 |  #### rVAD
559 | * http://kom.aau.dk/~zt/online/readme.htm
560 |  #### Aurora 2 VAD
561 | * http://kom.aau.dk/~zt/online/readme.htm
562 |  #### IsraelCohen
563 | * http://webee.technion.ac.il/people/IsraelCohen/Info/Software.html
564 |  #### Python interface to the WebRTC Voice Activity Detector
565 | * https://github.com/wiseman/py-webrtcvad
566 | 
567 | 
568 | ## <h2 id="6">资源</h2>
569 |  ###<h3 id="6.1">code/tool/data</h3>
570 |  ### cmusphinx
571 | * https://github.com/cmusphinx
572 |  #### julius-speech
573 | * https://github.com/julius-speech
574 |  #### OpenSLR
575 | * http://www.openslr.org/
576 |  #### List of speech recognition software
577 | * https://en.wikipedia.org/wiki/List_of_speech_recognition_software
578 |  #### KTH
579 | * http://www.speech.kth.se/software/
580 |  #### VERBIO
581 | * http://www.verbio.com/webverbiotm/html/productes.php?id=2
582 |  #### timeview
583 | * https://github.com/lxkain/timeview
584 |  #### Speech at CMU Web Page
585 | * http://www.speech.cs.cmu.edu/
586 |  #### CMU Robust Speech Group
587 | * http://www.cs.cmu.edu/~robust/code.html
588 |  #### Speech Software at CMU
589 | * http://www.speech.cs.cmu.edu/hephaestus.html
590 |  #### Aalto Speech Research
591 | * https://github.com/aalto-speech
592 |  #### CMU Festvox Project
593 | * https://github.com/festvox?tab=repositories
594 | * http://www.festvox.org/
595 |  #### CSTR
596 | * http://www.cstr.ed.ac.uk/research/
597 | * http://www.cstr.ed.ac.uk/downloads/
598 |  #### Xiph
599 | * https://github.com/xiph
600 |  #### Brno University of Technology Speech Processing Group
601 | * http://speech.fit.vutbr.cz/software
602 |  #### SoX
603 | * http://sox.sourceforge.net/
604 |  #### STRAIGHT
605 | * https://github.com/shuaijiang/STRAIGHT
606 | * http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html
607 |  #### Idiap Research Institute
608 | * https://github.com/idiap
609 |  #### Transcriber
610 | * http://trans.sourceforge.net/en/presentation.php
611 |  #### Amirsina Torfi
612 | * https://github.com/astorfi?tab=repositories
613 |  #### The Speech Recognition Virtual Kitchen
614 | * https://github.com/srvk
615 | * http://www.clsp.jhu.edu/~sriram/software/soft.html
616 |  #### Sparse Representation & Dictionary Learning Algorithms with Applications in Denoising, Separation, Localisation and Tracking
617 | * http://personal.ee.surrey.ac.uk/Personal/W.Wang/codes.html
618 |  #### Audacity
619 | * https://www.audacityteam.org/
620 |  #### beetbox
621 | * https://github.com/beetbox
622 |  #### CAQE
623 | * https://github.com/interactiveaudiolab/CAQE
624 |  #### UCL Speech Filing System
625 | * http://www.phon.ucl.ac.uk/resource/sfs/
626 |  #### Ryuichi Yamamoto
627 | * https://github.com/r9y9?tab=repositories
628 |  #### Kyubyong Park
629 | * https://github.com/Kyubyong?tab=repositories
630 |  #### Hideyuki Tachibana
631 | * https://github.com/tachi-hi?tab=repositories
632 |  #### Colin Raffel
633 | * https://github.com/craffel?tab=repositories
634 |  #### Paul Dixon
635 | * https://github.com/edobashira?tab=repositories
636 |  #### smacpy
637 | * https://github.com/danstowell/smacpy
638 |  #### c4dm
639 | * http://c4dm.eecs.qmul.ac.uk/software_data.html
640 |  #### Matt Shannon
641 | * https://github.com/MattShannon?tab=repositories
642 |  #### Keunwoo Choi
643 | * https://github.com/keunwoochoi?tab=repositories
644 |  #### ADASP
645 | * http://www.tsi.telecom-paristech.fr/aao/en/software-and-database/
646 |  #### uchicago Speech and Language @ TTIC
647 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm
648 |  #### justin salamon
649 | * http://www.justinsalamon.com/codedata.html
650 |  #### COLEA
651 | * http://ecs.utdallas.edu/loizou/speech/colea.htm
652 |  #### openAUDIO
653 | * http://www.openaudio.eu/
654 |  #### Praat
655 | * http://www.fon.hum.uva.nl/praat/
656 | * https://github.com/timmahrt/praatIO
657 |  #### librosa
658 | * https://github.com/librosa
659 |  #### Essentia
660 | * https://github.com/MTG/essentia
661 |  #### timmahrt
662 | * https://github.com/timmahrt?tab=repositories
663 |  #### Lefteris Zafiris
664 | * https://github.com/zaf?tab=repositories
665 |  #### audio-to-audio and audio-to-midi alignment
666 | * https://github.com/cataska/scorealign
667 |  #### DNN based hotword and wake word detection toolkit 
668 | * https://github.com/Kitt-AI/snowboy
669 |  #### free-spoken-digit-dataset
670 | * https://github.com/Jakobovski/free-spoken-digit-dataset
671 |  #### 中文语言资源联盟
672 | * http://www.chineseldc.org/resource_list.php?begin=0&count=20
673 |  #### Institute of Formal and Applied Linguistics – Dialogue Systems Group
674 | * https://github.com/UFAL-DSG
675 | 
676 | * https://github.com/edobashira/speech-language-processing
677 | * https://github.com/andabi?tab=repositories
678 | * https://code.soundsoftware.ac.uk/projects
679 | 
680 |  ### <h3 id="6.2">tutorial</h3>
681 |  #### DL for Computer Vision, Speech, and Language
682 | * http://llcao.net/cu-deeplearning17/resource.html
683 |  #### 臺大數位語音處理概論
684 | * http://speech.ee.ntu.edu.tw/courses.html
685 | * http://ocw.aca.ntu.edu.tw/ntu-ocw/ocw/cou/104S204
686 |  #### IISc Speech Information Processing
687 | * http://www.ee.iisc.ac.in/new/people/faculty/prasantg/e9261_speech_jan2018.html
688 | * http://www.practicalcryptography.com/miscellaneous/machine-learning/
689 |  
690 |  ### <h3 id="6.3">paper</h3>
691 | * https://arxiv.org/search/?query=speech&searchtype=all&source=header
692 | * https://www.isca-speech.org/iscaweb/index.php/archive/online-archive
693 | * https://www.aclweb.org/anthology/
694 | * https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers
695 |  #### states of the arts and recent results (bibliography) on speech recognition
696 | * https://github.com/syhw/wer_are_we
697 | 
698 | 
699 | 
700 | ## <h2 id="7">主页</h2>
701 |  #### Dan Povey
702 | * http://www.danielpovey.com/publications.html
703 |  #### cmusphinx
704 | * https://github.com/cmusphinx
705 |  #### CMU Language Technologies Institute
706 | * https://www.lti.cs.cmu.edu/work
707 |  #### CMU SPEECH@SV
708 | * http://speech.sv.cmu.edu/publications.html
709 |  #### Mitsubishi Electric Research Laboratorie
710 | * http://www.merl.com/publications/
711 |  #### MIT Spoken Language Systems
712 | * https://groups.csail.mit.edu/sls/downloads/
713 |  #### Brno University of Technology Speech Processing Group
714 | * http://speech.fit.vutbr.cz/software
715 |  #### IISc
716 | * https://spire.ee.iisc.ac.in/spire/allPublications.php
717 |  #### uchicago Speech and Language @ TTIC
718 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm
719 |  #### RWTH Aachen University
720 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html
721 |  #### TOKUDA and NANKAKU LABORATORY
722 | * http://www.sp.nitech.ac.jp/index.php?HOME%2FSOFTWARE
723 |  #### Institute of Formal and Applied Linguistics – Dialogue Systems Group
724 | * https://github.com/UFAL-DSG
725 |  #### Ohio State University speech separation
726 | * http://web.cse.ohio-state.edu/pnl/software.html 
727 |  #### LEAP Laboratory
728 | * http://www.leap.ee.iisc.ac.in/publications/
729 |  #### Hainan Xu
730 | * https://www.cs.jhu.edu/~hxu/
731 |  #### Mark Gales
732 | * http://mi.eng.cam.ac.uk/~mjfg/
733 |  #### Karen Livescu
734 | * http://ttic.uchicago.edu/~klivescu/
735 |  #### Shubham Toshniwal
736 | * https://github.com/shtoshni92?tab=repositories
737 |  #### Adrien Ycart
738 | * http://www.eecs.qmul.ac.uk/~ay304/code.html
739 |  #### Ron Weiss 
740 | * https://ronw.github.io//
741 |  #### Yajie Miao
742 | * https://www.cs.cmu.edu/~ymiao/
743 |  #### Scott T Wisdom
744 | * https://sites.google.com/site/scottwisdomhomepage/publications
745 |  #### Alan W Black
746 | * https://www.cs.cmu.edu/~awb/
747 |  #### Amirsina Torfi
748 | * https://www.amirsinatorfi.com/publications
749 |  #### Liang Lu
750 | * http://ttic.uchicago.edu/~llu/
751 |  #### Zhizheng WU
752 | * http://www.zhizheng.org/
753 |  #### justin salamon
754 | * http://www.justinsalamon.com/codedata.html
755 |  #### Karen Livescu
756 | * http://ttic.uchicago.edu/~klivescu/
757 |  #### Shubham Toshniwal
758 | * http://ttic.uchicago.edu/~shtoshni/#pubs
759 | * https://github.com/shtoshni92?tab=repositories
760 |  #### Keith Vertanen
761 | * http://www.keithv.com/software/
762 |  #### Aviv Gabbay
763 | * http://www.cs.huji.ac.il/~avivga/
764 |  #### Mehryar Mohri
765 | * https://cs.nyu.edu/~mohri/
766 |  #### Jonathan LE ROUX
767 | * http://www.jonathanleroux.org/
768 |  #### Suyoun Kim
769 | * https://synetkim.github.io/
770 |  #### DeepSound
771 | * http://deepsound.io/
772 |  #### Lei Xie
773 | * http://lxie.npu-aslp.org/
774 | 


--------------------------------------------------------------------------------