└── README.md
/README.md:
--------------------------------------------------------------------------------
1 | # awesome-speech
2 | this is a treasure-house of speech
3 |
4 | ## 目录
5 | * [语音识别(ASR,STT)](#1)
6 | * [page](#1.1)
7 | * [open source library/toolbox/code](#1.2)
8 | * [corpus/dataset](#1.3)
9 | * [教程(Tutorial)](#1.4)
10 | * [语音合成(Speech Synthesis,TTS)](#2)
11 | * [page](#2.1)
12 | * [open source library/toolbox/code](#2.2)
13 | * [corpus/dataset](#2.3)
14 | * [教程(Tutorial)](#2.4)
15 | * [声纹识别(Speaker Recognition)](#3)
16 | * [page](#3.1)
17 | * [open source library/toolbox/code](#3.2)
18 | * [corpus/dataset](#3.3)
19 | * [教程(Tutorial)](#3.4)
20 | * [对话系统(Dialogue Systems)](#4)
21 | * [page](#4.1)
22 | * [open source library/toolbox/code](#4.2)
23 | * [corpus/dataset](#4.3)
24 | * [教程(Tutorial)](#4.4)
25 | * [前端(front end)](#5)
26 | * [Speech Processing](#5.1)
27 | * [Audio I/O](#5.2)
28 | * [Sound Source Separation](#5.3)
29 | * [Feature Extraction](#5.4)
30 | * [VAD](#5.5)
31 | * [resource](#6)
32 | * [code/tool/data](#6.1)
33 | * [Tutorial](#6.2)
34 | * [paper](#6.3)
35 | * [pages](#7)
36 |
37 | ##
语音识别
38 | ### page
39 | #### Xingyu Na
40 | * http://naxingyu.github.io/
41 | * https://github.com/naxingyu?tab=repositories
42 | #### Language Processing and Pattern Recognition in University of Aachen
43 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html
44 | #### Fernando de la Calle Silos
45 | * http://www.tsc.uc3m.es/~fsilos/
46 | * https://github.com/fernandodelacalle?tab=repositories
47 |
48 | ### open source library/toolbox/code
49 | #### HTK
50 | * http://htk.eng.cam.ac.uk/download.shtml
51 | #### Py2HTK
52 | * https://github.com/g-leech/Py2HTK
53 | #### parallel-htk
54 | * https://github.com/jpuigcerver/parallel-htk
55 | #### HTK_C_MATLAB_tools
56 | * https://github.com/sinb/HTK_C_MATLAB_tools
57 |
58 | #### Kaldi:
59 | * https://github.com/kaldi-asr/kaldi
60 | #### Kaldi官方文档(中文版)
61 | * http://blog.geekidentity.com/asr/kaldi/kaldi_tutorial/
62 | #### Kaldi models
63 | * http://kaldi-asr.org/models.html
64 | #### Corpus Phonetics Tutorial
65 | * https://www.eleanorchodroff.com/tutorial/kaldi/kaldi-intro.html
66 | #### py-kaldi-asr
67 | * https://github.com/pykaldi/pykaldi
68 | * https://github.com/gooofy/py-kaldi-asr
69 | * https://github.com/UFAL-DSG/pykaldi
70 | * https://github.com/janchorowski/kaldi-python
71 | #### Dan's DNN implementation:
72 | * http://kaldi-asr.org/doc/dnn2.html
73 | #### pytorch-kaldi
74 | * https://github.com/mravanelli/pytorch-kaldi/
75 | #### kaldi-lstm
76 | * https://github.com/dophist/kaldi-lstm
77 | #### kaldi-ctc
78 | * https://github.com/lingochamp/kaldi-ctc
79 | #### keras-kaldi
80 | * https://github.com/dspavankumar/keras-kaldi
81 | #### python wrapper for kaldi-online-decoder
82 | * https://github.com/funcwj/pydecoder
83 | #### Kaldi+PDNN
84 | * https://github.com/yajiemiao/kaldipdnn
85 | #### tfkaldi
86 | * https://github.com/vrenkens/tfkaldi
87 | #### Kaldi_CNTK_AMI
88 | * https://github.com/chenguoguo/Kaldi_CNTK_AMI
89 | #### kaldi-io-for-python
90 | * https://github.com/vesis84/kaldi-io-for-python
91 | #### kaldi-pyio
92 | * https://github.com/funcwj/kaldi-pyio
93 | #### kaldi-tree-conv
94 | * https://github.com/dophist/kaldi-tree-conv
95 | #### kaldi-ivector
96 | * https://github.com/idiap/kaldi-ivector
97 | #### kaldi-yesno-tutorial
98 | * https://github.com/keighrim/kaldi-yesno-tutorial
99 | #### Kaldi nnet3 教程
100 | * https://gist.github.com/candlewill/f6c789059bf28b99cee8e18b99c20bfd
101 | #### Josh Meyer's Website
102 | * http://jrmeyer.github.io/
103 | #### Adapting your own Language Model for Kaldi
104 | * https://github.com/srvk/lm_build
105 | #### Some Kaldi Notes
106 | * http://jrmeyer.github.io/asr/2016/02/01/Kaldi-notes.html
107 | * http://sentiment-mining.blogspot.com/
108 | * http://pages.jh.edu/~echodro1/tutorial/kaldi/
109 | #### kaldi_tutorial
110 | * https://github.com/hyung8758/kaldi_tutorial
111 | #### Online decoder for Kaldi NNET2 and GMM speech recognition models with Python bindings
112 | * https://github.com/UFAL-DSG/alex-asr
113 | #### ResNet-Kaldi-Tensorflow-ASR
114 | * https://github.com/fernandodelacalle/ResNet-Kaldi-Tensorflow-ASR
115 | #### Kaldi ASR: Extending the ASpIRE model
116 | * https://chrisearch.wordpress.com/2017/03/11/speech-recognition-using-kaldi-extending-and-using-the-aspire-model/
117 | #### FastCGI support for Kaldi ASR
118 | * https://github.com/dialogflow/asr-server
119 | #### alignUsingKaldi
120 | * https://github.com/Sundy1219/alignUsingKaldi
121 | #### kaldi-readers-for-tensorflow
122 | * https://github.com/t13m/kaldi-readers-for-tensorflow
123 | #### kaldi-iot
124 | * https://github.com/dophist/kaldi-iot
125 | #### lattice-info
126 | * https://github.com/jpuigcerver/lattice-info
127 | #### lattice-char-to-word
128 | * https://github.com/jpuigcerver/lattice-char-to-word
129 | #### lattice-word-length-distribution
130 | * https://github.com/jpuigcerver/lattice-word-length-distribution
131 | #### kaldi-lattice-word-index
132 | * https://github.com/jpuigcerver/kaldi-lattice-word-index
133 | #### kaldi-decoders
134 | * https://github.com/jpuigcerver/kaldi-decoders
135 | #### lattice-remove-ctc-blank
136 | * https://github.com/jpuigcerver/lattice-remove-ctc-blank
137 | #### kaldi-lattice-search
138 | * https://github.com/jpuigcerver/kaldi-lattice-search
139 | #### htk2kaldi
140 | * https://github.com/jpuigcerver/htk2kaldi
141 | #### parallel-kaldi
142 | * https://github.com/jpuigcerver/parallel-kaldi
143 | #### kaldi 在线中文识别系统搭建
144 | * https://blog.csdn.net/shichaog/article/details/73655628
145 | #### kaldi-docker
146 | * https://github.com/golbin/kaldi-docker
147 | #### CSLT-Sparse-DNN-Toolkit
148 | * https://github.com/wyq730/CSLT-Sparse-DNN-Toolkit
149 | #### featxtra
150 | * https://github.com/mvansegbroeck/featxtra
151 | #### Sphinx
152 | * https://cmusphinx.github.io/
153 | * https://github.com/cmusphinx
154 | * https://github.com/cmusphinx/pocketsphinx
155 | #### OpenFst
156 | * http://www.openfst.org/twiki/bin/view/FST/WebHome
157 | * https://github.com/UFAL-DSG/openfst
158 | * https://github.com/benob/openfst-utils
159 | * https://github.com/vchahun/pyfst
160 | #### MIT Spoken Language Systems
161 | * https://groups.csail.mit.edu/sls/downloads/
162 | #### Julius
163 | * http://julius.osdn.jp/en_index.php
164 | * https://github.com/julius-speech/julius
165 | #### Bavieca
166 | * http://www.bavieca.org/
167 | #### Simon
168 | * https://simon.kde.org/
169 | #### SIDEKIT
170 | * http://www-lium.univ-lemans.fr/sidekit/
171 | #### SRILM
172 | * https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit
173 | * http://www.speech.sri.com/projects/srilm/
174 | * https://github.com/nuance1979/srilm-python
175 | * https://github.com/njsmith/pysrilm
176 | #### awd-lstm-lm
177 | * https://github.com/salesforce/awd-lstm-lm
178 | #### ISIP
179 | * https://www.isip.piconepress.com/projects/speech/
180 |
181 | #### MIT Finite-State Transducer (FST) Toolkit
182 | * http://groups.csail.mit.edu/sls/downloads/
183 | #### MIT Language Modeling (MITLM) Toolkit
184 | * http://groups.csail.mit.edu/sls/downloads/
185 | #### OpenGrm
186 | * http://www.openfst.org/twiki/bin/view/GRM/WebHome
187 | #### RNNLM
188 | * http://www.fit.vutbr.cz/~imikolov/rnnlm/
189 | * https://github.com/IntelLabs/rnnlm
190 | * https://github.com/glecorve/rnnlm2wfst
191 | #### faster-rnnlm
192 | * https://github.com/yandex/faster-rnnlm
193 | #### CUED-RNNLM Toolkit
194 | * http://mi.eng.cam.ac.uk/projects/cued-rnnlm/
195 | #### Using RNNLM rescoring a sentence in Chinese ASR system
196 | * https://github.com/Sundy1219/RNNLM
197 | #### KenLM
198 | * https://github.com/kpu/kenlm
199 | * https://kheafield.com/code/kenlm/
200 | #### rwthlm
201 | * https://www-i6.informatik.rwth-aachen.de/web/Software/rwthlm.php
202 | #### word-rnn-tensorflow
203 | * https://github.com/hunkim/word-rnn-tensorflow
204 | #### tensorlm
205 | * https://github.com/batzner/tensorlm
206 | #### SpeechRecognition
207 | * https://github.com/Uberi/speech_recognition
208 | #### SpeechPy
209 | * https://github.com/astorfi/speechpy
210 | #### Aalto
211 | * https://github.com/aalto-speech/AaltoASR
212 | #### google-cloud-speech
213 | * https://pypi.org/project/google-cloud-speech/
214 | #### apiai
215 | https://pypi.org/project/apiai/
216 | #### wit
217 | * https://github.com/wit-ai/pywit
218 | #### Nabu
219 | * https://github.com/vrenkens/nabu
220 | #### asr-study
221 | * https://github.com/igormq/asr-study
222 | #### dejavu
223 | * https://github.com/worldveil/dejavu
224 | #### uSpeech
225 | * https://github.com/arjo129/uSpeech
226 | #### Juicer
227 | * https://github.com/idiap/juicer
228 | #### PMLS
229 | * http://pmls.readthedocs.io/en/latest/dnn-speech.html
230 | #### dragonfly
231 | * https://github.com/t4ngo/dragonfly
232 | #### SPTK
233 | * https://github.com/r9y9/SPTK
234 | * https://github.com/sp-nitech/SPTK
235 | * http://sp-tk.sourceforge.net/
236 | #### pysptk
237 | * https://github.com/r9y9/pysptk
238 | #### RWTH ASR
239 | * https://www-i6.informatik.rwth-aachen.de/rwth-asr/
240 | #### Palaver
241 | * https://github.com/JamezQ/Palaver
242 | #### Praat
243 | * http://www.fon.hum.uva.nl/praat/
244 | * https://github.com/kylebgorman/textgrid
245 | #### Speech Recognition Grammar Specification
246 | * https://www.w3.org/TR/speech-grammar/
247 | #### Automatic_Speech_Recognition
248 | * https://github.com/zzw922cn/Automatic_Speech_Recognition
249 | #### speech-to-text-wavenet
250 | * https://github.com/buriburisuri/speech-to-text-wavenet
251 | #### tensorflow-speech-recognition
252 | * https://github.com/pannous/tensorflow-speech-recognition
253 | #### tensorflow_end2end_speech_recognition
254 | * https://github.com/hirofumi0810/tensorflow_end2end_speech_recognition
255 | #### tensorflow_speech_recognition_demo
256 | * https://github.com/llSourcell/tensorflow_speech_recognition_demo
257 | #### AVSR-Deep-Speech
258 | * https://github.com/pandeydivesh15/AVSR-Deep-Speech
259 | #### TTS and ASR
260 | * https://github.com/roboticslab-uc3m/speech
261 | #### CTC + Tensorflow Example for ASR
262 | * https://github.com/igormq/ctc_tensorflow_example
263 | #### tensorflow-ctc-speech-recognition
264 | * https://github.com/philipperemy/tensorflow-ctc-speech-recognition
265 | #### speechT
266 | * https://github.com/timediv/speechT
267 | #### end2endASR
268 | * https://github.com/cdyangbo/end2endASR
269 | #### NADU
270 | * https://github.com/vrenkens/nabu
271 | #### DTW (Dynamic Time Warping) python module
272 | * https://github.com/pierre-rouanet/dtw
273 | #### Various scripts and tools for speech recognition model building
274 | * https://github.com/gooofy/speech
275 | #### 基于深度学习的语音识别系统,使用CNN、LSTM和CTC实现的中文语音识别系统
276 | * https://github.com/nl8590687/ASRT_SpeechRecognition
277 | #### tacotron_asr
278 | * https://github.com/Kyubyong/tacotron_asr
279 | #### ASR_Keras
280 | * https://github.com/Chriskamphuis/ASR
281 | #### Kaggle Tensorflow Speech Recognition Challenge
282 | * https://dinantdatascientist.blogspot.dk/2018/02/kaggle-tensorflow-speech-recognition.html
283 | #### Speech recognition script for Asterisk that uses google's speech engine
284 | * https://github.com/zaf/asterisk-speech-recog
285 | #### Libraries and scripts for manipulating and handling ASR output/n-bests/etc
286 | * https://github.com/belambert/asr-tools
287 | #### Some scripts and commands for working with ASR
288 | * https://github.com/JRMeyer/asr
289 | #### PySpeechGrammar
290 | * https://github.com/ynop/pyspeechgrammar
291 | #### Python module for evaluating ASR hypotheses
292 | * https://github.com/belambert/asr-evaluation
293 | #### edit-distance
294 | * https://github.com/belambert/edit-distance
295 |
296 |
297 | ### dataset
298 | #### VoxForge
299 | * http://www.voxforge.org/home
300 | * http://www.voxforge.org/zh
301 | * http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/
302 | #### ASR Audio Data Links
303 | * https://github.com/robmsmt/ASR_Audio_Data_Links
304 | #### The CMU Pronouncing Dictionary
305 | * http://www.speech.cs.cmu.edu/cgi-bin/cmudict
306 | #### TIMIT
307 | * https://catalog.ldc.upenn.edu/LDC93S1
308 | * https://github.com/syhw/timit_tools
309 | * https://github.com/philipperemy/timit
310 | #### GlobalPhone Language Models
311 | * http://www.csl.uni-bremen.de/GlobalPhone/
312 | #### 1 Billion Word Language Model Benchmark
313 | * https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark
314 | * http://www.statmt.org/lm-benchmark/
315 | #### DaCiDian-Develop
316 | * https://github.com/dophist/DaCiDian-Develop
317 | #### CC-CEDICT
318 | * https://www.mdbg.net/chinese/dictionary?page=cc-cedict
319 | #### TED-LIUM
320 | * https://lium.univ-lemans.fr/ted-lium3/
321 | #### open-asr-lexicon
322 | * https://github.com/dophist/open-asr-lexicon
323 |
324 | ### Tutorial
325 | #### University of Edinburgh ASR2017-18
326 | * http://www.inf.ed.ac.uk/teaching/courses/asr/
327 | #### stanford CS224s
328 | * https://web.stanford.edu/class/cs224s/syllabus.html
329 | #### NYU asr12
330 | * https://cs.nyu.edu/~mohri/asr12/
331 | #### Speech Recognition with Neural Networks
332 | * http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/
333 |
334 |
335 | ## 语音合成
336 | ### page
337 | #### CSTR-Edinburgh
338 | * https://github.com/CSTR-Edinburgh
339 |
340 | ### open source library/toolbox
341 | #### WORLD
342 | * https://github.com/mmorise/World
343 | #### HTS
344 | * http://hts.sp.nitech.ac.jp/
345 | * http://hts-engine.sourceforge.net/
346 | * https://github.com/shamidreza/HTS-demo_CMU-ARCTIC-SLT-Formant
347 | * https://github.com/MattShannon/HTS-demo_CMU-ARCTIC-SLT-STRAIGHT-AR-decision-tree
348 | #### Tacotron
349 | * https://github.com/Kyubyong/tacotron
350 | * https://github.com/Kyubyong/expressive_tacotron
351 | * https://github.com/keithito/tacotron
352 | * https://github.com/GSByeon/multi-speaker-tacotron-tensorflow
353 | * https://github.com/r9y9/tacotron_pytorch
354 | * https://github.com/soobinseo/Tacotron-pytorch
355 | #### Tacotron2
356 | * https://github.com/NVIDIA/tacotron2
357 | * https://github.com/riverphoenix/tacotron2
358 | * https://github.com/A-Jacobson/tacotron2
359 | * https://github.com/selap91/Tacotron2
360 | * https://github.com/LGizkde/Tacotron2_Tao_Shujie
361 | * https://github.com/rlawns1016/Tacotron2
362 | * https://github.com/CapstoneInha/Tacotron2-rehearsal
363 | #### Merlin
364 | * https://github.com/CSTR-Edinburgh/merlin
365 | #### mozilla TTS
366 | * https://github.com/mozilla/TTS
367 | #### Flite
368 | * http://www.speech.cs.cmu.edu/flite/
369 | * https://github.com/festvox/flite
370 | #### Speect
371 | * http://speect.sourceforge.net/
372 | #### Festival
373 | * https://github.com/festvox/festival
374 | #### eSpeak
375 | * http://espeak.sourceforge.net/
376 | * https://github.com/gooofy/py-espeak-ng
377 | #### nnmnkwii
378 | * https://github.com/r9y9/nnmnkwii
379 | #### Ossian
380 | * https://github.com/CSTR-Edinburgh/Ossian
381 | #### gTTS
382 | * https://github.com/pndurette/gTTS
383 | #### gnuspeech
384 | * http://git.savannah.gnu.org/cgit/gnuspeech.git
385 | #### supercollider
386 | * https://github.com/supercollider/supercollider
387 | #### sc3-plugins
388 | * https://github.com/supercollider/sc3-plugins
389 | #### Neural_Network_Voices
390 | * https://github.com/llSourcell/Neural_Network_Voices
391 | #### pggan-pytorch
392 | * https://github.com/deepsound-project/pggan-pytorch
393 | #### cainteoir-engine
394 | * https://github.com/rhdunn/cainteoir-engine
395 | #### loop
396 | * https://github.com/facebookresearch/loop
397 | #### nnmnkwii
398 | * https://github.com/r9y9/nnmnkwii
399 | #### TTS and ASR
400 | * https://github.com/roboticslab-uc3m/speech
401 | #### musa_tts
402 | * https://github.com/santi-pdp/musa_tts
403 | #### marytts(JAVA)
404 | * https://github.com/marytts/marytts
405 |
406 |
407 | ## 声纹识别
408 | ### open source library/toolbox
409 | #### Alize
410 | * http://mistral.univ-avignon.fr/
411 | #### speaker-recognition-py3
412 | * https://github.com/crouchred/speaker-recognition-py3
413 | #### openVP
414 | * https://github.com/dake/openVP
415 | * https://github.com/swshon?tab=repositories
416 | ### Gender recognition by voice and speech analysis
417 | * http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/
418 | * https://github.com/primaryobjects/voice-gender
419 |
420 | ## 对话系统
421 | ### pages
422 | #### NTU
423 | * http://miulab.tw/
424 | * https://github.com/MiuLab
425 | * https://www.csie.ntu.edu.tw/~yvchen/publication.html
426 | #### Tsung-Hsien Wen
427 | * https://shawnwun.github.io/
428 |
429 | ### open source library/toolbox
430 | #### PyDial
431 | * http://www.camdial.org/pydial/
432 | #### alex
433 | * https://github.com/UFAL-DSG/alex
434 | #### ROS 语音交互系统
435 | * https://github.com/hntea/ros-speech
436 | #### 结合ROS框架的中文语音交互系统
437 | * https://github.com/hntea/speech-system-zh
438 |
439 | ## 前端
440 | ### Speech Processing
441 | #### madmom
442 | * https://github.com/CPJKU/madmom
443 | #### pydub
444 | * https://github.com/jiaaro/pydub
445 | #### kapre: Keras Audio Preprocessors
446 | * https://github.com/keunwoochoi/kapre
447 | #### BTK
448 | * http://distantspeechrecognition.sourceforge.net/
449 | #### EspNet
450 | * https://github.com/espnet/espnet
451 | #### Signal-Processing
452 | * https://github.com/mathEnthusaistCodes/Signal-Processing
453 | #### pyroomacoustics
454 | * https://github.com/LCAV/pyroomacoustics
455 | #### librosa
456 | * https://github.com/librosa/librosa
457 | * https://github.com/librosa/librosa_gallery
458 | #### REAPER
459 | * https://github.com/google/REAPER
460 | #### MSD_split_for_tagging
461 | * https://github.com/keunwoochoi/MSD_split_for_tagging
462 | #### VOICEBOX
463 | * http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
464 | #### liquid-dsp
465 | * https://github.com/jgaeddert/liquid-dsp
466 | #### ffts
467 | * https://github.com/anthonix/ffts
468 | #### mir_eval
469 | * https://github.com/craffel/mir_eval
470 | #### aupyom
471 | * https://github.com/pierre-rouanet/aupyom
472 | #### Pitch Detection
473 | * http://note.sonots.com/SciSoftware/Pitch.html
474 | #### TFTB
475 | * http://tftb.nongnu.org/
476 | #### maracas
477 | * https://github.com/jfsantos/maracas
478 | #### SRMRpy
479 | * https://github.com/jfsantos/SRMRpy
480 | #### ssp
481 | * https://github.com/idiap/ssp
482 | * https://github.com/idiap/libssp
483 | #### iss
484 | * https://github.com/idiap/iss
485 | * https://github.com/idiap/iss-dicts
486 | #### asr_preprocessing
487 | * https://github.com/hirofumi0810/asr_preprocessing
488 | #### asrt
489 | * https://github.com/idiap/asrt
490 | #### Audio super resolution using NN
491 | * https://github.com/kuleshov/audio-super-res
492 | #### RNN training for noise reduction in robust asr
493 | * https://github.com/amaas/rnn-speech-denoising
494 | #### RNN for audio noise reduction
495 | * https://github.com/xiph/rnnoise
496 | #### muda
497 | * https://github.com/bmcfee/muda
498 | #### Efficient sample rate conversion in python
499 | * https://github.com/bmcfee/resampy
500 | #### Smarc audio rate converter
501 | * http://audio-smarc.sourceforge.net/
502 | #### Python scripts to computes f0s of a wave file
503 | * https://github.com/t13m/pyPitchCom
504 |
505 | ### Audio I/O
506 | #### PortAudio
507 | * http://www.portaudio.com/
508 | #### audiolab
509 | * https://github.com/cournape/audiolab
510 | #### pytorch audio
511 | * https://github.com/pytorch/audio
512 | #### Digital Speech Decoder
513 | * https://github.com/szechyjs/dsd
514 | #### audioread
515 | * https://github.com/beetbox/audioread
516 | #### audacity.py
517 | * https://github.com/davidavdav/audacity.py
518 |
519 | ### Sound Source Separation
520 | #### HARK
521 | * https://www.hark.jp/wiki.cgi?page=HARK+Installation+Instructions
522 | #### Deep RNN for Source Separation
523 | * https://github.com/posenhuang/deeplearningsourceseparation
524 | #### nussl
525 | * https://github.com/interactiveaudiolab/nussl
526 | #### DNN for Music Source Separation in Tensorflow
527 | * https://andabi.github.io/music-source-separation/
528 | #### Alexey Ozerov
529 | * http://www.irisa.fr/metiss/ozerov/
530 | #### University of Surrey CVSSP
531 | * https://github.com/CVSSP
532 | #### source separation using CNN
533 | * https://github.com/emma-mens/ASR
534 |
535 | ### Feature Extraction
536 | #### openSMILE
537 | * https://audeering.com/technology/opensmile/
538 | * https://github.com/naxingyu/opensmile
539 | #### veles.sound_feature_extraction
540 | * https://github.com/Samsung/veles.sound_feature_extraction
541 | #### vamp-plugin-sdk
542 | * https://github.com/c4dm/vamp-plugin-sdk
543 | #### Yaafe
544 | * http://yaafe.sourceforge.net/
545 | #### py_bank
546 | * https://github.com/wil-j-wil/py_bank
547 | #### AuditoryFilterbanks
548 | * https://github.com/jfsantos/AuditoryFilterbanks
549 | #### python_speech_features
550 | * https://github.com/jameslyons/python_speech_features
551 |
552 | ### VAD
553 | * https://github.com/jtkim-kaist/VAD
554 | * https://github.com/jtkim-kaist/VAD_DNN
555 | * https://github.com/marsbroshok/VAD-python
556 | * https://github.com/shiweixingcn/vad
557 | * https://github.com/fedden/RenderMan
558 | #### rVAD
559 | * http://kom.aau.dk/~zt/online/readme.htm
560 | #### Aurora 2 VAD
561 | * http://kom.aau.dk/~zt/online/readme.htm
562 | #### IsraelCohen
563 | * http://webee.technion.ac.il/people/IsraelCohen/Info/Software.html
564 | #### Python interface to the WebRTC Voice Activity Detector
565 | * https://github.com/wiseman/py-webrtcvad
566 |
567 |
568 | ## 资源
569 | ###code/tool/data
570 | ### cmusphinx
571 | * https://github.com/cmusphinx
572 | #### julius-speech
573 | * https://github.com/julius-speech
574 | #### OpenSLR
575 | * http://www.openslr.org/
576 | #### List of speech recognition software
577 | * https://en.wikipedia.org/wiki/List_of_speech_recognition_software
578 | #### KTH
579 | * http://www.speech.kth.se/software/
580 | #### VERBIO
581 | * http://www.verbio.com/webverbiotm/html/productes.php?id=2
582 | #### timeview
583 | * https://github.com/lxkain/timeview
584 | #### Speech at CMU Web Page
585 | * http://www.speech.cs.cmu.edu/
586 | #### CMU Robust Speech Group
587 | * http://www.cs.cmu.edu/~robust/code.html
588 | #### Speech Software at CMU
589 | * http://www.speech.cs.cmu.edu/hephaestus.html
590 | #### Aalto Speech Research
591 | * https://github.com/aalto-speech
592 | #### CMU Festvox Project
593 | * https://github.com/festvox?tab=repositories
594 | * http://www.festvox.org/
595 | #### CSTR
596 | * http://www.cstr.ed.ac.uk/research/
597 | * http://www.cstr.ed.ac.uk/downloads/
598 | #### Xiph
599 | * https://github.com/xiph
600 | #### Brno University of Technology Speech Processing Group
601 | * http://speech.fit.vutbr.cz/software
602 | #### SoX
603 | * http://sox.sourceforge.net/
604 | #### STRAIGHT
605 | * https://github.com/shuaijiang/STRAIGHT
606 | * http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html
607 | #### Idiap Research Institute
608 | * https://github.com/idiap
609 | #### Transcriber
610 | * http://trans.sourceforge.net/en/presentation.php
611 | #### Amirsina Torfi
612 | * https://github.com/astorfi?tab=repositories
613 | #### The Speech Recognition Virtual Kitchen
614 | * https://github.com/srvk
615 | * http://www.clsp.jhu.edu/~sriram/software/soft.html
616 | #### Sparse Representation & Dictionary Learning Algorithms with Applications in Denoising, Separation, Localisation and Tracking
617 | * http://personal.ee.surrey.ac.uk/Personal/W.Wang/codes.html
618 | #### Audacity
619 | * https://www.audacityteam.org/
620 | #### beetbox
621 | * https://github.com/beetbox
622 | #### CAQE
623 | * https://github.com/interactiveaudiolab/CAQE
624 | #### UCL Speech Filing System
625 | * http://www.phon.ucl.ac.uk/resource/sfs/
626 | #### Ryuichi Yamamoto
627 | * https://github.com/r9y9?tab=repositories
628 | #### Kyubyong Park
629 | * https://github.com/Kyubyong?tab=repositories
630 | #### Hideyuki Tachibana
631 | * https://github.com/tachi-hi?tab=repositories
632 | #### Colin Raffel
633 | * https://github.com/craffel?tab=repositories
634 | #### Paul Dixon
635 | * https://github.com/edobashira?tab=repositories
636 | #### smacpy
637 | * https://github.com/danstowell/smacpy
638 | #### c4dm
639 | * http://c4dm.eecs.qmul.ac.uk/software_data.html
640 | #### Matt Shannon
641 | * https://github.com/MattShannon?tab=repositories
642 | #### Keunwoo Choi
643 | * https://github.com/keunwoochoi?tab=repositories
644 | #### ADASP
645 | * http://www.tsi.telecom-paristech.fr/aao/en/software-and-database/
646 | #### uchicago Speech and Language @ TTIC
647 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm
648 | #### justin salamon
649 | * http://www.justinsalamon.com/codedata.html
650 | #### COLEA
651 | * http://ecs.utdallas.edu/loizou/speech/colea.htm
652 | #### openAUDIO
653 | * http://www.openaudio.eu/
654 | #### Praat
655 | * http://www.fon.hum.uva.nl/praat/
656 | * https://github.com/timmahrt/praatIO
657 | #### librosa
658 | * https://github.com/librosa
659 | #### Essentia
660 | * https://github.com/MTG/essentia
661 | #### timmahrt
662 | * https://github.com/timmahrt?tab=repositories
663 | #### Lefteris Zafiris
664 | * https://github.com/zaf?tab=repositories
665 | #### audio-to-audio and audio-to-midi alignment
666 | * https://github.com/cataska/scorealign
667 | #### DNN based hotword and wake word detection toolkit
668 | * https://github.com/Kitt-AI/snowboy
669 | #### free-spoken-digit-dataset
670 | * https://github.com/Jakobovski/free-spoken-digit-dataset
671 | #### 中文语言资源联盟
672 | * http://www.chineseldc.org/resource_list.php?begin=0&count=20
673 | #### Institute of Formal and Applied Linguistics – Dialogue Systems Group
674 | * https://github.com/UFAL-DSG
675 |
676 | * https://github.com/edobashira/speech-language-processing
677 | * https://github.com/andabi?tab=repositories
678 | * https://code.soundsoftware.ac.uk/projects
679 |
680 | ### tutorial
681 | #### DL for Computer Vision, Speech, and Language
682 | * http://llcao.net/cu-deeplearning17/resource.html
683 | #### 臺大數位語音處理概論
684 | * http://speech.ee.ntu.edu.tw/courses.html
685 | * http://ocw.aca.ntu.edu.tw/ntu-ocw/ocw/cou/104S204
686 | #### IISc Speech Information Processing
687 | * http://www.ee.iisc.ac.in/new/people/faculty/prasantg/e9261_speech_jan2018.html
688 | * http://www.practicalcryptography.com/miscellaneous/machine-learning/
689 |
690 | ### paper
691 | * https://arxiv.org/search/?query=speech&searchtype=all&source=header
692 | * https://www.isca-speech.org/iscaweb/index.php/archive/online-archive
693 | * https://www.aclweb.org/anthology/
694 | * https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers
695 | #### states of the arts and recent results (bibliography) on speech recognition
696 | * https://github.com/syhw/wer_are_we
697 |
698 |
699 |
700 | ## 主页
701 | #### Dan Povey
702 | * http://www.danielpovey.com/publications.html
703 | #### cmusphinx
704 | * https://github.com/cmusphinx
705 | #### CMU Language Technologies Institute
706 | * https://www.lti.cs.cmu.edu/work
707 | #### CMU SPEECH@SV
708 | * http://speech.sv.cmu.edu/publications.html
709 | #### Mitsubishi Electric Research Laboratorie
710 | * http://www.merl.com/publications/
711 | #### MIT Spoken Language Systems
712 | * https://groups.csail.mit.edu/sls/downloads/
713 | #### Brno University of Technology Speech Processing Group
714 | * http://speech.fit.vutbr.cz/software
715 | #### IISc
716 | * https://spire.ee.iisc.ac.in/spire/allPublications.php
717 | #### uchicago Speech and Language @ TTIC
718 | * http://ttic.uchicago.edu/~klivescu/SLATTIC/resources.htm
719 | #### RWTH Aachen University
720 | * https://www-i6.informatik.rwth-aachen.de/web/Software/index.html
721 | #### TOKUDA and NANKAKU LABORATORY
722 | * http://www.sp.nitech.ac.jp/index.php?HOME%2FSOFTWARE
723 | #### Institute of Formal and Applied Linguistics – Dialogue Systems Group
724 | * https://github.com/UFAL-DSG
725 | #### Ohio State University speech separation
726 | * http://web.cse.ohio-state.edu/pnl/software.html
727 | #### LEAP Laboratory
728 | * http://www.leap.ee.iisc.ac.in/publications/
729 | #### Hainan Xu
730 | * https://www.cs.jhu.edu/~hxu/
731 | #### Mark Gales
732 | * http://mi.eng.cam.ac.uk/~mjfg/
733 | #### Karen Livescu
734 | * http://ttic.uchicago.edu/~klivescu/
735 | #### Shubham Toshniwal
736 | * https://github.com/shtoshni92?tab=repositories
737 | #### Adrien Ycart
738 | * http://www.eecs.qmul.ac.uk/~ay304/code.html
739 | #### Ron Weiss
740 | * https://ronw.github.io//
741 | #### Yajie Miao
742 | * https://www.cs.cmu.edu/~ymiao/
743 | #### Scott T Wisdom
744 | * https://sites.google.com/site/scottwisdomhomepage/publications
745 | #### Alan W Black
746 | * https://www.cs.cmu.edu/~awb/
747 | #### Amirsina Torfi
748 | * https://www.amirsinatorfi.com/publications
749 | #### Liang Lu
750 | * http://ttic.uchicago.edu/~llu/
751 | #### Zhizheng WU
752 | * http://www.zhizheng.org/
753 | #### justin salamon
754 | * http://www.justinsalamon.com/codedata.html
755 | #### Karen Livescu
756 | * http://ttic.uchicago.edu/~klivescu/
757 | #### Shubham Toshniwal
758 | * http://ttic.uchicago.edu/~shtoshni/#pubs
759 | * https://github.com/shtoshni92?tab=repositories
760 | #### Keith Vertanen
761 | * http://www.keithv.com/software/
762 | #### Aviv Gabbay
763 | * http://www.cs.huji.ac.il/~avivga/
764 | #### Mehryar Mohri
765 | * https://cs.nyu.edu/~mohri/
766 | #### Jonathan LE ROUX
767 | * http://www.jonathanleroux.org/
768 | #### Suyoun Kim
769 | * https://synetkim.github.io/
770 | #### DeepSound
771 | * http://deepsound.io/
772 | #### Lei Xie
773 | * http://lxie.npu-aslp.org/
774 |
--------------------------------------------------------------------------------