├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Natural Language Processing Tasks and Selected References 2 | 3 | I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP. 4 | 5 | I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request! 6 | 7 | Oct. 13, 2017.
8 | by Kyubyong 9 | 10 | Reviewed and updated by [YJ Choe](https://github.com/yjchoe) on Oct. 18, 2017. 11 | 12 | ## Anaphora Resolution 13 | * See [Coreference Resolution](#coreference-resolution) 14 | 15 | ## Automated Essay Scoring 16 | * ****`PAPER`**** [Automatic Text Scoring Using Neural Networks](https://arxiv.org/abs/1606.04289) 17 | * ****`PAPER`**** [A Neural Approach to Automated Essay Scoring](http://www.aclweb.org/old_anthology/D/D16/D16-1193.pdf) 18 | * ****`CHALLENGE`**** [Kaggle: The Hewlett Foundation: Automated Essay Scoring](https://www.kaggle.com/c/asap-aes) 19 | * ****`PROJECT`**** [EASE (Enhanced AI Scoring Engine)](https://github.com/edx/ease) 20 | 21 | ## Automatic Speech Recognition 22 | * ****`WIKI`**** [Speech recognition](https://en.wikipedia.org/wiki/Speech_recognition) 23 | * ****`PAPER`**** [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595) 24 | * ****`PAPER`**** [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499) 25 | * ****`PROJECT`**** [A TensorFlow implementation of Baidu's DeepSpeech architecture](https://github.com/mozilla/DeepSpeech) 26 | * ****`PROJECT`**** [Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNet](https://github.com/buriburisuri/speech-to-text-wavenet) 27 | * ****`CHALLENGE`**** [The 5th CHiME Speech Separation and Recognition Challenge](http://spandh.dcs.shef.ac.uk/chime_challenge/) 28 | * ****`DATA`**** [The 5th CHiME Speech Separation and Recognition Challenge](http://spandh.dcs.shef.ac.uk/chime_challenge/download.html) 29 | * ****`DATA`**** [CSTR VCTK Corpus](http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html) 30 | * ****`DATA`**** [LibriSpeech ASR corpus](http://www.openslr.org/12/) 31 | * ****`DATA`**** [Switchboard-1 Telephone Speech Corpus](https://catalog.ldc.upenn.edu/ldc97s62) 32 | * ****`DATA`**** [TED-LIUM Corpus](http://www-lium.univ-lemans.fr/en/content/ted-lium-corpus) 33 | * ****`DATA`**** [Open Speech and Language Resources](http://www.openslr.org/) 34 | * ****`DATA`**** [Common Voice](https://voice.mozilla.org/en/data) 35 | 36 | ## Automatic Summarisation 37 | * ****`WIKI`**** [Automatic summarization](https://en.wikipedia.org/wiki/Automatic_summarization) 38 | * ****`BOOK`**** [Automatic Text Summarization](https://www.amazon.com/Automatic-Text-Summarization-Juan-Manuel-Torres-Moreno/dp/1848216688/ref=sr_1_1?s=books&ie=UTF8&qid=1507782304&sr=1-1&keywords=Automatic+Text+Summarization) 39 | * ****`PAPER`**** [Text Summarization Using Neural Networks](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.823.8025&rep=rep1&type=pdf) 40 | * ****`PAPER`**** [Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization](https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9414/9520) 41 | * ****`DATA`**** [Text Analytics Conferences (TAC)](https://tac.nist.gov/data/index.html) 42 | * ****`DATA`**** [Document Understanding Conferences (DUC)](http://www-nlpir.nist.gov/projects/duc/data.html) 43 | 44 | ## Coreference Resolution 45 | * ****`INFO`**** [Coreference Resolution](https://nlp.stanford.edu/projects/coref.shtml) 46 | * ****`PAPER`**** [Deep Reinforcement Learning for Mention-Ranking Coreference Models](https://arxiv.org/abs/1609.08667) 47 | * ****`PAPER`**** [Improving Coreference Resolution by Learning Entity-Level Distributed Representations](https://arxiv.org/abs/1606.01323) 48 | * ****`CHALLENGE`**** [CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes](http://conll.cemantix.org/2012/task-description.html) 49 | * ****`CHALLENGE`**** [CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes](http://conll.cemantix.org/2011/task-description.html) 50 | * ****`CHALLENGE`**** [SemEval 2018 Task 4: Character Identification on Multiparty Dialogues](https://competitions.codalab.org/competitions/17310) 51 | 52 | ## Entity Linking 53 | * See [Named Entity Disambiguation](#named-entity-disambiguation) 54 | 55 | ## Grammatical Error Correction 56 | * ****`PAPER`**** [A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction](https://arxiv.org/abs/1801.08831) 57 | * ****`PAPER`**** [Neural Network Translation Models for Grammatical Error Correction](https://arxiv.org/abs/1606.00189) 58 | * ****`PAPER`**** [Adapting Sequence Models for Sentence Correction](http://aclweb.org/anthology/D17-1297) 59 | * ****`CHALLENGE`**** [CoNLL-2013 Shared Task: Grammatical Error Correction](http://www.comp.nus.edu.sg/~nlp/conll13st.html) 60 | * ****`CHALLENGE`**** [CoNLL-2014 Shared Task: Grammatical Error Correction](http://www.comp.nus.edu.sg/~nlp/conll14st.html) 61 | * ****`DATA`**** [NUS Non-commercial research/trial corpus license](http://www.comp.nus.edu.sg/~nlp/conll14st/nucle_license.pdf) 62 | * ****`DATA`**** [Lang-8 Learner Corpora](http://cl.naist.jp/nldata/lang-8/) 63 | * ****`DATA`**** [Cornell Movie--Dialogs Corpus](http://www.cs.cornell.edu/%7Ecristian/Cornell_Movie-Dialogs_Corpus.html) 64 | * ****`PROJECT`**** [Deep Text Corrector](https://github.com/atpaino/deep-text-corrector) 65 | * ****`PRODUCT`**** [deep grammar](http://deepgrammar.com/) 66 | 67 | ## Grapheme To Phoneme Conversion 68 | * ****`PAPER`**** [Grapheme-to-Phoneme Models for (Almost) Any Language](https://pdfs.semanticscholar.org/b9c8/fef9b6f16b92c6859f6106524fdb053e9577.pdf) 69 | * ****`PAPER`**** [Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning](https://arxiv.org/pdf/1605.03832.pdf) 70 | * ****`PAPER`**** [Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion](https://pdfs.semanticscholar.org/26d0/09959fa2b2e18cddb5783493738a1c1ede2f.pdf) 71 | * ****`PROJECT`**** [Sequence-to-Sequence G2P toolkit](https://github.com/cmusphinx/g2p-seq2seq) 72 | * ****`PROJECT`**** [g2p_en: A Simple Python Module for English Grapheme To Phoneme Conversion](https://github.com/kyubyong/g2p) 73 | * ****`DATA`**** [Multilingual Pronunciation Data](https://drive.google.com/drive/folders/0B7R_gATfZJ2aWkpSWHpXUklWUmM) 74 | 75 | ## Humor and Sarcasm Detection 76 | * ****`PAPER`**** [Automatic Sarcasm Detection: A Survey](https://arxiv.org/abs/1602.03426) 77 | * ****`PAPER`**** [Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal](http://aclweb.org/anthology/D17-1051) 78 | * ****`PAPER`**** [Sarcasm Detection on Twitter: A Behavioral Modeling Approach](http://ai2-s2-pdfs.s3.amazonaws.com/67b5/9db00c29152d8e738f693f153e1ab9b43466.pdf) 79 | * ****`CHALLENGE`**** [SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor](http://alt.qcri.org/semeval2017/task6/) 80 | * ****`CHALLENGE`**** [SemEval-2017 Task 7: Detection and Interpretation of English Puns](http://alt.qcri.org/semeval2017/task7/) 81 | * ****`DATA`**** [Sarcastic comments from Reddit](https://www.kaggle.com/danofer/sarcasm/) 82 | * ****`DATA`**** [Sarcasm Corpus V2](https://nlds.soe.ucsc.edu/sarcasm2) 83 | * ****`DATA`**** [Sarcasm Amazon Reviews Corpus](https://github.com/ef2020/SarcasmAmazonReviewsCorpus) 84 | 85 | ## Language Grounding 86 | * ****`WIKI`**** [Symbol grounding problem](https://en.wikipedia.org/wiki/Symbol_grounding_problem) 87 | * ****`PAPER`**** [The Symbol Grounding Problem](http://courses.media.mit.edu/2004spring/mas966/Harnad%20symbol%20grounding.pdf) 88 | * ****`PAPER`**** [From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning](https://arxiv.org/abs/1610.03342) 89 | * ****`PAPER`**** [Encoding of phonology in a recurrent neural model of grounded speech](https://arxiv.org/abs/1706.03815) 90 | * ****`PAPER`**** [Gated-Attention Architectures for Task-Oriented Language Grounding](https://arxiv.org/abs/1706.07230) 91 | * ****`PAPER`**** [Sound-Word2Vec: Learning Word Representations Grounded in Sounds](https://arxiv.org/abs/1703.01720) 92 | * ****`COURSE`**** [Language Grounding to Vision and Control](https://www.cs.cmu.edu/~katef/808/) 93 | * ****`WORKSHOP`**** [Language Grounding for Robotics](https://robonlp2017.github.io/) 94 | 95 | ## Language Guessing 96 | * See [Language Identification](#language-identification) 97 | 98 | ## Language Identification 99 | * ****`WIKI`**** [Language identification](https://en.wikipedia.org/wiki/Language_identification) 100 | * ****`PAPER`**** [AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS](https://repositorio.uam.es/bitstream/handle/10486/666848/automatic_lopez-moreno_ICASSP_2014_ps.pdf?sequence=1) 101 | * ****`PAPER`**** [Natural Language Processing with Small Feed-Forward Networks](http://aclweb.org/anthology/D17-1308) 102 | * ****`CHALLENGE`**** [2015 Language Recognition Evaluation](https://www.nist.gov/itl/iad/mig/2015-language-recognition-evaluation) 103 | 104 | ## Language Modeling 105 | * ****`WIKI`**** [Language model](https://en.wikipedia.org/wiki/Language_model) 106 | * ****`TOOLKIT`**** [KenLM Language Model Toolkit](http://kheafield.com/code/kenlm/) 107 | * ****`PAPER`**** [Distributed Representations of Words and Phrases and their Compositionality](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) 108 | * ****`PAPER`**** [Generating Sequences with Recurrent Neural Networks](https://arxiv.org/pdf/1308.0850.pdf) 109 | * ****`PAPER`**** [Character-Aware Neural Language Models](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewFile/12489/12017) 110 | * ****`THESIS`**** [Statistical Language Models Based on Neural Networks](http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf) 111 | * ****`DATA`**** [Penn Treebank](https://github.com/townie/PTB-dataset-from-Tomas-Mikolov-s-webpage/tree/master/data) 112 | * ****`TUTORIAL`**** [TensorFlow Tutorial on Language Modeling with Recurrent Neural Networks](https://www.tensorflow.org/tutorials/recurrent#language_modeling) 113 | 114 | ## Language Recognition 115 | * See [Language Identification](#language-identification) 116 | 117 | ## Lemmatisation 118 | * ****`WIKI`**** [Lemmatisation](https://en.wikipedia.org/wiki/Lemmatisation) 119 | * ****`PAPER`**** [Joint Lemmatization and Morphological Tagging with LEMMING](http://www.cis.lmu.de/~muellets/pdf/emnlp_2015.pdf) 120 | * ****`TOOLKIT`**** [WordNet Lemmatizer](http://www.nltk.org/api/nltk.stem.html#nltk.stem.wordnet.WordNetLemmatizer.lemmatize) 121 | * ****`DATA`**** [Treebank-3](https://catalog.ldc.upenn.edu/ldc99t42) 122 | 123 | ## Lip-reading 124 | * ****`WIKI`**** [Lip reading](https://en.wikipedia.org/wiki/Lip_reading) 125 | * ****`PAPER`**** [LipNet: End-to-End Sentence-level Lipreading](https://arxiv.org/abs/1611.01599) 126 | * ****`PAPER`**** [Lip Reading Sentences in the Wild](https://arxiv.org/abs/1611.05358) 127 | * ****`PAPER`**** [Large-Scale Visual Speech Recognition](https://arxiv.org/abs/1807.05162) 128 | * ****`PROJECT`**** [Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks](https://github.com/astorfi/lip-reading-deeplearning) 129 | * ****`PRODUCT`**** [Liopa](http://www.liopa.co.uk/) 130 | * ****`DATA`**** [The GRID audiovisual sentence corpus](http://spandh.dcs.shef.ac.uk/gridcorpus/) 131 | * ****`DATA`**** [The BBC-Oxford 'Multi-View Lip Reading Sentences' (MV-LRS) Dataset](http://www.robots.ox.ac.uk/~vgg/data/lip_reading_sentences/) 132 | 133 | ## Machine Translation 134 | * ****`PAPER`**** [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473) 135 | * ****`PAPER`**** [Neural Machine Translation in Linear Time](https://arxiv.org/abs/1610.10099) 136 | * ****`PAPER`**** [Attention Is All You Need](https://arxiv.org/abs/1706.03762) 137 | * ****`PAPER`**** [Six Challenges for Neural Machine Translation](http://aclweb.org/anthology/W/W17/W17-3204.pdf) 138 | * ****`PAPER`**** [Phrase-Based & Neural Unsupervised Machine Translation](https://arxiv.org/abs/1804.07755) 139 | * ****`CHALLENGE`**** [ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION](http://www.statmt.org/wmt14/translation-task.html#download) 140 | * ****`CHALLENGE`**** [EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17) ](http://www.statmt.org/wmt17/translation-task.html) 141 | * ****`DATA`**** [OpenSubtitles2016](http://opus.lingfil.uu.se/OpenSubtitles2016.php) 142 | * ****`DATA`**** [WIT3: Web Inventory of Transcribed and Translated Talks](https://wit3.fbk.eu/) 143 | * ****`DATA`**** [The QCRI Educational Domain (QED) Corpus](http://alt.qcri.org/resources/qedcorpus/) 144 | * ****`PAPER`**** [Multi-task Sequence to Sequence Learning](https://arxiv.org/abs/1511.06114) 145 | * ****`PAPER`**** [Unsupervised Pretraining for Sequence to Sequence Learning](http://aclweb.org/anthology/D17-1039) 146 | * ****`PAPER`**** [Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation](https://arxiv.org/abs/1611.04558) 147 | * ****`TOOLKIT`**** [Subword Neural Machine Translation with Byte Pair Encoding (BPE)](https://github.com/rsennrich/subword-nmt) 148 | * ****`TOOLKIT`**** [Multi-Way Neural Machine Translation](https://github.com/nyu-dl/dl4mt-multi) 149 | * ****`TOOLKIT`**** [OpenNMT: Open-Source Toolkit for Neural Machine Translation](http://opennmt.net/) 150 | 151 | ## Morphological Inflection Generation 152 | * ****`WIKI`**** [Inflection](https://en.wikipedia.org/wiki/Inflection) 153 | * ****`PAPER`**** [Morphological Inflection Generation Using Character Sequence to Sequence Learning](https://arxiv.org/abs/1512.06110) 154 | * ****`CHALLENGE`**** [SIGMORPHON 2016 Shared Task: Morphological Reinflection](http://ryancotterell.github.io/sigmorphon2016/) 155 | * ****`DATA`**** [sigmorphon2016](https://github.com/ryancotterell/sigmorphon2016) 156 | 157 | ## Named Entity Disambiguation 158 | * ****`WIKI`**** [Entity linking](https://en.wikipedia.org/wiki/Entity_linking) 159 | * ****`PAPER`**** [Robust and Collective Entity Disambiguation through Semantic Embeddings](http://www.stefanzwicklbauer.info/pdf/Sigir_2016.pdf) 160 | 161 | ## Named Entity Recognition 162 | * ****`WIKI`**** [Named-entity recognition](https://en.wikipedia.org/wiki/Named-entity_recognition) 163 | * ****`PAPER`**** [Neural Architectures for Named Entity Recognition](https://arxiv.org/abs/1603.01360) 164 | * ****`PROJECT`**** [OSU Twitter NLP Tools](https://github.com/aritter/twitter_nlp) 165 | * ****`CHALLENGE`**** [Named Entity Recognition in Twitter](https://noisy-text.github.io/2016/ner-shared-task.html) 166 | * ****`CHALLENGE`**** [CoNLL 2002 Language-Independent Named Entity Recognition](https://www.clips.uantwerpen.be/conll2002/ner/) 167 | * ****`CHALLENGE`**** [Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition](http://aclweb.org/anthology/W03-0419) 168 | * ****`DATA`**** [CoNLL-2002 NER corpus](https://github.com/teropa/nlp/tree/master/resources/corpora/conll2002) 169 | * ****`DATA`**** [CoNLL-2003 NER corpus](https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003) 170 | * ****`DATA`**** [NUT Named Entity Recognition in Twitter Shared task](https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16) 171 | * ****`TOOLKIT`**** [Stanford Named Entity Recognizer](https://nlp.stanford.edu/software/CRF-NER.shtml) 172 | 173 | ## Paraphrase Detection 174 | * ****`PAPER`**** [Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.650.7199&rep=rep1&type=pdf) 175 | * ****`PROJECT`**** [Paralex: Paraphrase-Driven Learning for Open Question Answering](http://knowitall.cs.washington.edu/paralex/) 176 | * ****`CHALLENGE`**** [SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter](http://alt.qcri.org/semeval2015/task1/) 177 | * ****`DATA`**** [Microsoft Research Paraphrase Corpus](https://www.microsoft.com/en-us/download/details.aspx?id=52398) 178 | * ****`DATA`**** [Microsoft Research Video Description Corpus](https://www.microsoft.com/en-us/download/details.aspx?id=52422&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F38cf15fd-b8df-477e-a4e4-a4680caa75af%2F) 179 | * ****`DATA`**** [Pascal Dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/pascal-sentences/index.html) 180 | * ****`DATA`**** [Flickr Dataset](http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html) 181 | * ****`DATA`**** [The SICK data set](http://clic.cimec.unitn.it/composes/sick.html) 182 | * ****`DATA`**** [PPDB: The Paraphrase Database](http://www.cis.upenn.edu/%7Eccb/ppdb/) 183 | * ****`DATA`**** [WikiAnswers Paraphrase Corpus](http://knowitall.cs.washington.edu/paralex/wikianswers-paraphrases-1.0.tar.gz) 184 | 185 | ## Paraphrase Generation 186 | * ****`PAPER`**** [Neural Paraphrase Generation with Stacked Residual LSTM Networks](https://arxiv.org/pdf/1610.03098.pdf) 187 | * ****`DATA`**** [Neural Paraphrase Generation with Stacked Residual LSTM Networks](https://github.com/iamaaditya/neural-paraphrase-generation/tree/master/data) 188 | * ****`CODE`**** [Neural Paraphrase Generation with Stacked Residual LSTM Networks](https://github.com/iamaaditya/neural-paraphrase-generation) 189 | * ****`PAPER`**** [A Deep Generative Framework for Paraphrase Generation](https://arxiv.org/pdf/1709.05074.pdf) 190 | * ****`PAPER`**** [Paraphrasing Revisited with Neural Machine Translation](http://www.research.ed.ac.uk/portal/files/34902784/document.pdf) 191 | 192 | 193 | ## Parsing 194 | * ****`WIKI`**** [Parsing](https://en.wikipedia.org/wiki/Parsing) 195 | * ****`TOOLKIT`**** [The Stanford Parser: A statistical parser](https://nlp.stanford.edu/software/lex-parser.shtml) 196 | * ****`TOOLKIT`**** [spaCy parser](https://spacy.io/docs/usage/dependency-parse) 197 | * ****`PAPER`**** [Grammar as a Foreign Language](https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf) 198 | * ****`PAPER`**** [A fast and accurate dependency parser using neural networks](http://www.aclweb.org/anthology/D14-1082) 199 | * ****`PAPER`**** [Universal Semantic Parsing](https://aclanthology.info/pdf/D/D17/D17-1009.pdf) 200 | * ****`CHALLENGE`**** [CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies](http://universaldependencies.org/conll17/) 201 | * ****`CHALLENGE`**** [CoNLL 2016 Shared Task: Multilingual Shallow Discourse Parsing](http://www.cs.brandeis.edu/~clp/conll16st/) 202 | * ****`CHALLENGE`**** [CoNLL 2015 Shared Task: Shallow Discourse Parsing](http://www.cs.brandeis.edu/~clp/conll15st/) 203 | * ****`CHALLENGE`**** [SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!](http://alt.qcri.org/semeval2016/task8/) 204 | 205 | ## Part-of-speech Tagging 206 | * ****`WIKI`**** [Part-of-speech tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging) 207 | * ****`PAPER`**** [Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss](https://arxiv.org/pdf/1604.05529.pdf) 208 | * ****`PAPER`**** [Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models](https://transacl.org/ojs/index.php/tacl/article/viewFile/837/192) 209 | * ****`DATA`**** [Treebank-3](https://catalog.ldc.upenn.edu/ldc99t42) 210 | * ****`TOOLKIT`**** [nltk.tag package](http://www.nltk.org/api/nltk.tag.html) 211 | 212 | ## Pinyin-To-Chinese Conversion 213 | * ****`WIKI`**** [Pinyin input method](https://en.wikipedia.org/wiki/Pinyin_input_method) 214 | * ****`PAPER`**** [Neural Network Language Model for Chinese Pinyin Input Method Engine](http://aclweb.org/anthology/Y15-1052) 215 | * ****`PROJECT`**** [Neural Chinese Transliterator](https://github.com/Kyubyong/neural_chinese_transliterator) 216 | 217 | ## Question Answering 218 | * ****`WIKI`**** [Question answering](https://en.wikipedia.org/wiki/Question_answering) 219 | * ****`PAPER`**** [Ask Me Anything: Dynamic Memory Networks for Natural Language Processing](http://www.thespermwhale.com/jaseweston/ram/papers/paper_21.pdf) 220 | * ****`PAPER`**** [Dynamic Memory Networks for Visual and Textual Question Answering](http://proceedings.mlr.press/v48/xiong16.pdf) 221 | * ****`CHALLENGE`**** [TREC Question Answering Task](http://trec.nist.gov/data/qamain.html) 222 | * ****`CHALLENGE`**** [NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)](http://aclia.lti.cs.cmu.edu/ntcir8/Home) 223 | * ****`CHALLENGE`**** [CLEF Question Answering Track](http://nlp.uned.es/clef-qa/) 224 | * ****`CHALLENGE`**** [SemEval-2017 Task 3: Community Question Answering](http://alt.qcri.org/semeval2017/task3/) 225 | * ****`CHALLENGE`**** [SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge](https://competitions.codalab.org/competitions/17184) 226 | * ****`DATA`**** [MS MARCO: Microsoft MAchine Reading COmprehension Dataset](http://www.msmarco.org/) 227 | * ****`DATA`**** [Maluuba NewsQA](https://github.com/Maluuba/newsqa) 228 | * ****`DATA`**** [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://rajpurkar.github.io/SQuAD-explorer/) 229 | * ****`DATA`**** [GraphQuestions: A Characteristic-rich Question Answering Dataset](https://github.com/ysu1989/GraphQuestions) 230 | * ****`DATA`**** [Story Cloze Test and ROCStories Corpora](http://cs.rochester.edu/nlp/rocstories/) 231 | * ****`DATA`**** [Microsoft Research WikiQA Corpus](https://www.microsoft.com/en-us/download/details.aspx?id=52419&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F4495da01-db8c-4041-a7f6-7984a4f6a905%2Fdefault.aspx) 232 | * ****`DATA`**** [DeepMind Q&A Dataset](http://cs.nyu.edu/%7Ekcho/DMQA/) 233 | * ****`DATA`**** [QASent](http://cs.stanford.edu/people/mengqiu/data/qg-emnlp07-data.tgz) 234 | * ****`DATA`**** [Textbook Question Answering](http://textbookqa.org/) 235 | 236 | ## Relationship Extraction 237 | * ****`WIKI`**** [Relationship extraction](https://en.wikipedia.org/wiki/Relationship_extraction) 238 | * ****`PAPER`**** [A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm](http://www.sciencedirect.com/science/article/pii/S0950705116001210) 239 | * ****`CHALLENGE`**** [SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers](https://competitions.codalab.org/competitions/17422) 240 | 241 | ## Semantic Role Labeling 242 | * ****`WIKI`**** [Semantic role labeling](https://en.wikipedia.org/wiki/Semantic_role_labeling) 243 | * ****`BOOK`**** [Semantic Role Labeling](https://www.amazon.com/Semantic-Labeling-Synthesis-Lectures-Technologies/dp/1598298313/ref=sr_1_1?s=books&ie=UTF8&qid=1507776173&sr=1-1&keywords=Semantic+Role+Labeling) 244 | * ****`PAPER`**** [End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf) 245 | * ****`PAPER`**** [Neural Semantic Role Labeling with Dependency Path Embeddings](https://arxiv.org/abs/1605.07515) 246 | * ****`PAPER`**** [Deep Semantic Role Labeling: What Works and What's Next](https://homes.cs.washington.edu/~luheng/files/acl2017_hllz.pdf) 247 | * ****`CHALLENGE`**** [CoNLL-2005 Shared Task: Semantic Role Labeling](http://www.cs.upc.edu/~srlconll/st05/st05.html) 248 | * ****`CHALLENGE`**** [CoNLL-2004 Shared Task: Semantic Role Labeling](http://www.cs.upc.edu/~srlconll/st04/st04.html) 249 | * ****`TOOLKIT`**** [Illinois Semantic Role Labeler (SRL)](http://cogcomp.org/page/software_view/SRL) 250 | * ****`DATA`**** [CoNLL-2005 Shared Task: Semantic Role Labeling](http://www.cs.upc.edu/~srlconll/soft.html) 251 | 252 | ## Sentence Boundary Disambiguation 253 | * ****`WIKI`**** [Sentence boundary disambiguation](https://en.wikipedia.org/wiki/Sentence_boundary_disambiguation) 254 | * ****`PAPER`**** [A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001746/) 255 | * ****`TOOLKIT`**** [NLTK Tokenizers](http://www.nltk.org/_modules/nltk/tokenize.html) 256 | * ****`DATA`**** [The British National Corpus](http://www.natcorp.ox.ac.uk/) 257 | * ****`DATA`**** [Switchboard-1 Telephone Speech Corpus](https://catalog.ldc.upenn.edu/ldc97s62) 258 | 259 | ## Sentiment Analysis 260 | * ****`WIKI`**** [Sentiment analysis](https://en.wikipedia.org/wiki/Sentiment_analysis) 261 | * ****`INFO`**** [Awesome Sentiment Analysis](https://github.com/xiamx/awesome-sentiment-analysis) 262 | * ****`CHALLENGE`**** [Kaggle: UMICH SI650 - Sentiment Classification](https://www.kaggle.com/c/si650winter11#description) 263 | * ****`CHALLENGE`**** [SemEval-2017 Task 4: Sentiment Analysis in Twitter](http://alt.qcri.org/semeval2017/task4/) 264 | * ****`CHALLENGE`**** [SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News](http://alt.qcri.org/semeval2017/task5/) 265 | * ****`PROJECT`**** [SenticNet](http://sentic.net/about/) 266 | * ****`PROJECT`**** [Stanford NLP Group Sentiment Analysis](https://nlp.stanford.edu/sentiment/) 267 | * ****`DATA`**** [Multi-Domain Sentiment Dataset (version 2.0)](http://www.cs.jhu.edu/%7Emdredze/datasets/sentiment/) 268 | * ****`DATA`**** [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment/code.html) 269 | * ****`DATA`**** [Twitter Sentiment Corpus](http://www.sananalytics.com/lab/twitter-sentiment/) 270 | * ****`DATA`**** [Twitter Sentiment Analysis Training Corpus](http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/) 271 | * ****`DATA`**** [AFINN: List of English words rated for valence](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010) 272 | 273 | ## Sign Language Recognition/Translation 274 | * ****`PAPER`**** [Video-based Sign Language Recognition without Temporal Segmentation](https://arxiv.org/pdf/1801.10111.pdf) 275 | * ****`PAPER`**** [SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition](http://openaccess.thecvf.com/content_ICCV_2017/papers/Camgoz_SubUNets_End-To-End_Hand_ICCV_2017_paper.pdf) 276 | * ****`DATA`**** [RWTH-PHOENIX-Weather](https://www-i6.informatik.rwth-aachen.de/~forster/database-rwth-phoenix.php) 277 | * ****`DATA`**** [ASLLRP](http://www.bu.edu/asllrp/) 278 | * ****`PROJECT`**** [SignAll](http://www.signall.us/) 279 | 280 | 281 | ## Singing Voice Synthesis 282 | * ****`PAPER`**** [Singing voice synthesis based on deep neural networks](https://pdfs.semanticscholar.org/9a8e/b69480eead85f32ee4b92fa2563dd5f83401.pdf) 283 | * ****`PAPER`**** [A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs](http://www.mdpi.com/2076-3417/7/12/1313) 284 | * ****`PRODUCT`**** [VOCALOID: voice synthesis technology and software developed by Yamaha](https://www.vocaloid.com/en) 285 | * ****`CHALLENGE`**** [Special Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"](https://chanter.limsi.fr/doku.php?id=description:start) 286 | 287 | ## Social Science Applications 288 | * ****`WORKSHOP`**** [NLP+CSS: Workshops on Natural Language Processing and Computational Social Science](https://sites.google.com/site/nlpandcss/) 289 | * ****`TOOLKIT`**** [Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints](https://github.com/uclanlp/reducingbias) 290 | * ****`TOOLKIT`**** [Online Variational Bayes for Latent Dirichlet Allocation (LDA)](https://github.com/blei-lab/onlineldavb) 291 | * ****`GROUP`**** [The University of Chicago Knowledge Lab](http://www.knowledgelab.org/) 292 | 293 | ## Source Separation 294 | * ****`WIKI`**** [Source separation](https://en.wikipedia.org/wiki/Source_separation) 295 | * ****`PAPER`**** [From Blind to Guided Audio Source Separation](https://hal-univ-rennes1.archives-ouvertes.fr/hal-00922378/document) 296 | * ****`PAPER`**** [Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation](https://arxiv.org/abs/1502.04149) 297 | * ****`CHALLENGE`**** [Signal Separation Evaluation Campaign (SiSEC)](https://sisec.inria.fr/) 298 | * ****`CHALLENGE`**** [CHiME Speech Separation and Recognition Challenge](http://spandh.dcs.shef.ac.uk/chime_challenge/) 299 | 300 | ## Speaker Authentication 301 | * See [Speaker Verification](#speaker-verification) 302 | 303 | ## Speaker Diarisation 304 | * ****`WIKI`**** [Speaker diarisation](https://en.wikipedia.org/wiki/Speaker_diarisation) 305 | * ****`PAPER`**** [DNN-based speaker clustering for speaker diarisation](http://eprints.whiterose.ac.uk/109281/1/milner_is16.pdf) 306 | * ****`PAPER`**** [Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach](http://groups.csail.mit.edu/sls/publications/2013/Shum_IEEE_Oct-2013.pdf) 307 | * ****`PAPER`**** [Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion](https://arxiv.org/pdf/1603.09725.pdf) 308 | * ****`CHALLENGE`**** [Rich Transcription Evaluation](https://www.nist.gov/itl/iad/mig/rich-transcription-evaluation) 309 | 310 | ## Speaker Recognition 311 | * ****`WIKI`**** [Speaker recognition](https://en.wikipedia.org/wiki/Speaker_recognition) 312 | * ****`PAPER`**** [A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK](https://pdfs.semanticscholar.org/204a/ff8e21791c0a4113a3f75d0e6424a003c321.pdf) 313 | * ****`PAPER`**** [DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41939.pdf) 314 | * ****`PAPER`**** [Deep Speaker: an End-to-End Neural Speaker Embedding System](https://arxiv.org/abs/1705.02304) 315 | * ****`PROJECT`**** [Voice Vector: which of the Hollywood stars is most similar to my voice?](https://github.com/andabi/voice-vector) 316 | * ****`CHALLENGE`**** [NIST Speaker Recognition Evaluation (SRE)](https://www.nist.gov/itl/iad/mig/speaker-recognition) 317 | * ****`INFO`**** [Are there any suggestions for free databases for speaker recognition?](https://www.researchgate.net/post/Are_there_any_suggestions_for_free_databases_for_speaker_recognition) 318 | * ****`DATA`**** [VoxCeleb2: Deep Speaker Recognition](http://www.robots.ox.ac.uk/~vgg/data/voxceleb2/) 319 | 320 | ## Speech Reading 321 | * See [Lip-reading](#lip-reading) 322 | 323 | ## Speech Recognition 324 | * See [Automatic Speech Recognition](#automatic-speech-recognition) 325 | 326 | ## Speech Segmentation 327 | * ****`WIKI`**** [Speech_segmentation](https://en.wikipedia.org/wiki/Speech_segmentation) 328 | * ****`PAPER`**** [Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics](http://www.utm.toronto.edu/infant-child-centre/sites/files/infant-child-centre/public/shared/elizabeth-johnson/Johnson_Jusczyk.pdf) 329 | * ****`PAPER`**** [Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings](https://arxiv.org/abs/1603.02845) 330 | * ****`PAPER`**** [Unsupervised Lexicon Discovery from Acoustic Input](http://www.aclweb.org/old_anthology/Q/Q15/Q15-1028.pdf) 331 | * ****`PAPER`**** [Weakly supervised spoken term discovery using cross-lingual side information](http://www.research.ed.ac.uk/portal/files/29957958/1609.06530v1.pdf) 332 | * ****`DATA`**** [CALLHOME Spanish Speech](https://catalog.ldc.upenn.edu/ldc96s35) 333 | 334 | ## Speech Synthesis 335 | * ****`WIKI`**** [Speech synthesis](https://en.wikipedia.org/wiki/Speech_synthesis) 336 | * ****`PAPER`**** [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions](https://arxiv.org/abs/1712.05884) 337 | * ****`PAPER`**** [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499) 338 | * ****`PAPER`**** [Tacotron: Towards End-to-End Speech Synthesis](https://arxiv.org/abs/1703.10135) 339 | * ****`PAPER`**** [Deep Voice 3: 2000-Speaker Neural Text-to-Speech](https://arxiv.org/abs/1710.07654) 340 | * ****`PAPER`**** [Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention](https://arxiv.org/abs/1710.08969) 341 | * ****`DATA`**** [The World English Bible](https://github.com/Kyubyong/tacotron) 342 | * ****`DATA`**** [LJ Speech Dataset](https://github.com/keithito/tacotron) 343 | * ****`DATA`**** [Lessac Data](http://www.cstr.ed.ac.uk/projects/blizzard/2011/lessac_blizzard2011/) 344 | * ****`CHALLENGE`**** [Blizzard Challenge 2017](https://synsig.org/index.php/Blizzard_Challenge_2017) 345 | * ****`PRODUCT`**** [Lyrebird](https://lyrebird.ai/) 346 | * ****`PROJECT`**** [The Festvox project](http://www.festvox.org/index.html) 347 | * ****`TOOLKIT`**** [Merlin: The Neural Network (NN) based Speech Synthesis System](https://github.com/CSTR-Edinburgh/merlin) 348 | 349 | ## Speech Enhancement 350 | * ****`WIKI`**** [Speech enhancement](https://en.wikipedia.org/wiki/Speech_enhancement) 351 | * ****`BOOK`**** [Speech enhancement: theory and practice](https://www.amazon.com/Speech-Enhancement-Theory-Practice-Second/dp/1466504218/ref=sr_1_1?ie=UTF8&qid=1507874199&sr=8-1&keywords=Speech+enhancement%3A+theory+and+practice) 352 | * ****`PAPER`**** [An Experimental Study on Speech Enhancement BasedonDeepNeuralNetwork](http://staff.ustc.edu.cn/~jundu/Speech%20signal%20processing/publications/SPL2014_Xu.pdf) 353 | * ****`PAPER`**** [A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworks](https://www.researchgate.net/profile/Yong_Xu63/publication/272436458_A_Regression_Approach_to_Speech_Enhancement_Based_on_Deep_Neural_Networks/links/57fdfdda08aeaf819a5bdd97.pdf) 354 | * ****`PAPER`**** [Speech Enhancement Based on Deep Denoising Autoencoder](https://www.researchgate.net/profile/Yu_Tsao/publication/283600839_Speech_enhancement_based_on_deep_denoising_Auto-Encoder/links/577b486108ae213761c9c7f8/Speech-enhancement-based-on-deep-denoising-Auto-Encoder.pdf) 355 | 356 | ## Speech-To-Text 357 | * See [Automatic Speech Recognition](#automatic-speech-recognition) 358 | 359 | ## Spoken Term Detection 360 | * See [Speech Segmentation](#speech-segmentation) 361 | 362 | ## Stemming 363 | * ****`WIKI`**** [Stemming](https://en.wikipedia.org/wiki/Stemming) 364 | * ****`PAPER`**** [A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMING](http://www.jatit.org/volumes/Vol82No3/7Vol82No3.pdf) 365 | * ****`TOOLKIT`**** [NLTK Stemmers](http://www.nltk.org/howto/stem.html) 366 | 367 | ## Term Extraction 368 | * ****`WIKI`**** [Terminology extraction](https://en.wikipedia.org/wiki/Terminology_extraction) 369 | * ****`PAPER`**** [Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection](https://arxiv.org/pdf/1604.00077.pdf) 370 | 371 | ## Text Similarity 372 | * ****`WIKI`**** [Semantic similarity](https://en.wikipedia.org/wiki/Semantic_similarity) 373 | * ****`PAPER`**** [A Survey of Text Similarity Approaches](https://pdfs.semanticscholar.org/5b5c/a878c534aee3882a038ef9e82f46e102131b.pdf) 374 | * ****`PAPER`**** [Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks](http://casa.disi.unitn.it/~moschitt/since2013/2015_SIGIR_Severyn_LearningRankShort.pdf) 375 | * ****`PAPER`**** [Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks](https://nlp.stanford.edu/pubs/tai-socher-manning-acl2015.pdf) 376 | * ****`CHALLENGE`**** [SemEval-2014 Task 3: Cross-Level Semantic Similarity](http://alt.qcri.org/semeval2014/task3/) 377 | * ****`CHALLENGE`**** [SemEval-2014 Task 10: Multilingual Semantic Textual Similarity](http://alt.qcri.org/semeval2014/task10/) 378 | * ****`CHALLENGE`**** [SemEval-2017 Task 1: Semantic Textual Similarity](http://alt.qcri.org/semeval2017/task1/) 379 | * ****`WIKI`**** [Semantic Textual Similarity Wiki](http://ixa2.si.ehu.es/stswiki/index.php/Main_Page) 380 | 381 | ## Text Simplification 382 | * ****`WIKI`**** [Text simplification](https://en.wikipedia.org/wiki/Text_simplification) 383 | * ****`PAPER`**** [Aligning Sentences from Standard Wikipedia to Simple Wikipedia](https://ssli.ee.washington.edu/~hannaneh/papers/simplification.pdf) 384 | * ****`PAPER`**** [Problems in Current Text Simplification Research: New Data Can Help](https://pdfs.semanticscholar.org/2b8d/a013966c0c5e020ebc842d49d8ed166c8783.pdf) 385 | * ****`DATA`**** [Newsela Data](https://newsela.com/data/) 386 | 387 | ## Text-To-Speech 388 | * See [Speech Synthesis](#speech-synthesis) 389 | 390 | ## Textual Entailment 391 | * ****`WIKI`**** [Textual entailment](https://en.wikipedia.org/wiki/Textual_entailment) 392 | * ****`PROJECT`**** [Textual Entailment with TensorFlow](https://github.com/Steven-Hewitt/Entailment-with-Tensorflow) 393 | * ****`PAPER`**** [Textual Entailment with Structured Attentions and Composition](https://arxiv.org/pdf/1701.01126.pdf) 394 | * ****`CHALLENGE`**** [SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment](http://alt.qcri.org/semeval2014/task1/) 395 | * ****`CHALLENGE`**** [SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge](https://www.cs.york.ac.uk/semeval-2013/task7.html) 396 | 397 | ## Transliteration 398 | * ****`WIKI`**** [Transliteration](https://en.wikipedia.org/wiki/Transliteration) 399 | * ****`INFO`**** [Transliteration of Non-Latin scripts](http://transliteration.eki.ee/) 400 | * ****`PAPER`**** [A Deep Learning Approach to Machine Transliteration](https://pdfs.semanticscholar.org/54f1/23122b8dd1f1d3067cf348cfea1276914377.pdf) 401 | * ****`CHALLENGE`**** [NEWS 2016 Shared Task on Transliteration of Named Entities](http://workshop.colips.org/news2016/index.html) 402 | * ****`PROJECT`**** [Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?](https://github.com/Kyubyong/neural_japanese_transliterator) 403 | 404 | ## Voice Conversion 405 | * ****`PAPER`**** [PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING](http://www1.se.cuhk.edu.hk/~hccl/publications/pub/2016_paper_297.pdf) 406 | * ****`PROJECT`**** [Deep neural networks for voice conversion (voice style transfer) in Tensorflow](https://github.com/andabi/deep-voice-conversion) 407 | * ****`PROJECT`**** [An implementation of voice conversion system utilizing phonetic posteriorgrams](https://github.com/sesenosannko/ppg_vc) 408 | * ****`CHALLENGE`**** [Voice Conversion Challenge 2016](http://www.vc-challenge.org/vcc2016/index.html) 409 | * ****`CHALLENGE`**** [Voice Conversion Challenge 2018](http://www.vc-challenge.org/) 410 | * ****`DATA`**** [CMU_ARCTIC speech synthesis databases](http://festvox.org/cmu_arctic/) 411 | * ****`DATA`**** [TIMIT Acoustic-Phonetic Continuous Speech Corpus](https://catalog.ldc.upenn.edu/ldc93s1) 412 | 413 | ## Voice Recognition 414 | * See [Speaker recognition](#speaker-recognition) 415 | 416 | ## Word Embeddings 417 | * ****`WIKI`**** [Word embedding](https://en.wikipedia.org/wiki/Word_embedding) 418 | * ****`TOOLKIT`**** [Gensim: word2vec](https://radimrehurek.com/gensim/models/word2vec.html) 419 | * ****`TOOLKIT`**** [fastText](https://github.com/facebookresearch/fastText) 420 | * ****`TOOLKIT`**** [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/projects/glove/) 421 | * ****`INFO`**** [Where to get a pretrained model](https://github.com/3Top/word2vec-api) 422 | * ****`PROJECT`**** [Pre-trained word vectors](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md) 423 | * ****`PROJECT`**** [Pre-trained word vectors of 30+ languages](https://github.com/Kyubyong/wordvectors) 424 | * ****`PROJECT`**** [Polyglot: Distributed word representations for multilingual NLP](https://sites.google.com/site/rmyeid/projects/polyglot) 425 | * ****`PROJECT`**** [BPEmb: a collection of pre-trained subword embeddings in 275 languages](https://github.com/bheinzerling/bpemb) 426 | * ****`CHALLENGE`**** [SemEval 2018 Task 10 Capturing Discriminative Attributes](https://competitions.codalab.org/competitions/17326) 427 | * ****`PAPER`**** [Bilingual Word Embeddings for Phrase-Based Machine Translation](https://ai.stanford.edu/~wzou/emnlp2013_ZouSocherCerManning.pdf) 428 | * ****`PAPER`**** [A Survey of Cross-Lingual Embedding Models](https://arxiv.org/abs/1706.04902) 429 | 430 | ## Word Prediction 431 | * ****`INFO`**** [What is Word Prediction?](http://www2.edc.org/ncip/library/wp/what_is.htm) 432 | * ****`PAPER`**** [The prediction of character based on recurrent neural network language model](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7960065) 433 | * ****`PAPER`**** [An Embedded Deep Learning based Word Prediction](https://arxiv.org/abs/1707.01662) 434 | * ****`PAPER`**** [Evaluating Word Prediction: Framing Keystroke Savings](http://aclweb.org/anthology/P08-2066) 435 | * ****`DATA`**** [An Embedded Deep Learning based Word Prediction](https://github.com/Meinwerk/WordPrediction/master.zip) 436 | * ****`PROJECT`**** [Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?](https://github.com/Kyubyong/word_prediction) 437 | * ****`CHALLENGE`**** [SemEval-2018 Task 2, Multilingual Emoji Prediction](https://competitions.codalab.org/competitions/17344) 438 | 439 | ## Word Segmentation 440 | * ****`WIKI`**** [Word segmentation](https://en.wikipedia.org/wiki/Text_segmentation#Segmentation_problems) 441 | * ****`PAPER`**** [Neural Word Segmentation Learning for Chinese](https://arxiv.org/abs/1606.04300) 442 | * ****`PROJECT`**** [Convolutional neural network for Chinese word segmentation](https://github.com/chqiwang/convseg) 443 | * ****`TOOLKIT`**** [Stanford Word Segmenter](https://nlp.stanford.edu/software/segmenter.html) 444 | * ****`TOOLKIT`**** [NLTK Tokenizers](http://www.nltk.org/_modules/nltk/tokenize.html) 445 | 446 | ## Word Sense Disambiguation 447 | * ****`DATA`**** [Word-sense disambiguation](https://en.wikipedia.org/wiki/Word-sense_disambiguation) 448 | * ****`PAPER`**** [Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data](http://www.aclweb.org/anthology/D17-1008) 449 | * ****`DATA`**** [Train-O-Matic Data](http://trainomatic.org/data/train-o-matic-data.zip) 450 | * ****`DATA`**** [BabelNet](http://babelnet.org/) 451 | 452 | --------------------------------------------------------------------------------