└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Natural language understanding dataets in 2018 2 | 3 | This page collects NLU datasets proposed in 2018. 4 | 5 | | Dataset | task | style | size | source | where | web | misc | similar datasets | 6 | |------------|------|----------------------|------|--------------------|-----------|----------------------------------------------------|-----------------------------------|----------------------| 7 | | CoQA | RC | free form (+no ans) | 127k | various articles | TACL? | [url](https://stanfordnlp.github.io/coqa/) | conversational questions | QuAC | 8 | | QuAC | RC | extraction (+no ans) | 100k | Wikipedia | EMNLP2018 | [url](http://quac.ai/) | conversational questions | CoQA | 9 | | HotpotQA | RC | extraction | 113k | Wikipedia | EMNLP2018 | [url](http://hotpotqa.github.io/) | multi-hop reasoning | QAngaroo | 10 | | SWAG | QA | multiple choice | 113k | video caption | EMNLP2018 | [url](http://rowanzellers.com/swag/) | situational commonsense reasoning | | 11 | | DNC | NLI | textual entailment | 570k | NLP tasks | EMNLP2018 | [url](http://decomp.io/) | diverse NLI | SNLI, MultiNLI | 12 | | OpenBookQA | QA | multiple choice | 6k | science facts | EMNLP2018 | [url](http://data.allenai.org/OpenBookQA) | external knowledge | ARC | 13 | | RecipeQA | RC+ | various | 36k | recipe | EMNLP2018 | [url](https://hucvl.github.io/recipeqa/) | multimodal comprehension | TextbookQA, FigureQA | 14 | | CLOTH | RC | cloze | 99k | English exams | EMNLP2018 | [url](http://www.cs.cmu.edu/~glai1/data/cloth/) | | RACE | 15 | | DuoRC | RC | extraction | 186k | movie plot | ACL2018 | [url](https://duorc.github.io/) | | NarrativeQA | 16 | | SQuAD2.0 | RC | extraction (+no ans) | 150k | Wikipedia | ACL2018 | [url](https://rajpurkar.github.io/SQuAD-explorer/) | no answer: 50k | NewsQA | 17 | | CliCR | RC | cloze | 100k | clinical case text | NAACL2018 | [url](https://github.com/clips/clicr) | | | 18 | | FEVER | NLI? | fact verification | 185k | Wikipedia | NAACL2018 | [url](http://fever.ai/) | | | 19 | | MultiRC | RC | multiple choice | 6k+ | various articles | NAACL2018 | [url](http://cogcomp.org/multirc/) | multiple sentence reasoning | MCTest | 20 | | ProPara | RC | various | 2k | procedural text | NAACL2018 | [url](https://github.com/allenai/propara) | | bAbI, SCoNE | 21 | | ARC | RC | multiple choice | 8k | science exam | ? | [url](http://data.allenai.org/arc/) | easy 5197, challenge 2590 | | 22 | 23 | TODO: 24 | * [Interpretation of Natural Language Rules in Conversational Machine Reading](https://arxiv.org/abs/1809.01494) (Saeidi+ 2018, EMNLP) 25 | * [Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds](http://aclweb.org/anthology/P18-1077) (Labutov+ 2018, ACL) 26 | * [Event2Mind: Commonsense Inference on Events, Intents, and Reactions](http://aclweb.org/anthology/P18-1043) (Rashkin+ 2018, ACL) 27 | * [Modeling Naive Psychology of Characters in Simple Commonsense Stories](http://aclweb.org/anthology/P18-1213) (Rashkin+ 2018, ACL) 28 | * [emrQA: A Large Corpus for Question Answering on Electronic Medical Records](http://aclweb.org/anthology/D18-1258) (Pampari+ 2018, EMNLP) 29 | 30 | Note: 31 | * QA = question answering, RC = reading comprehension = question answering with the context, NLI = natural language inference aka recognizing textual entailment 32 | --------------------------------------------------------------------------------