└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Natural language understanding dataets in 2018
 2 | 
 3 | This page collects NLU datasets proposed in 2018.
 4 | 
 5 | | Dataset    | task | style                | size | source             | where     | web                                                | misc                              | similar datasets     |
 6 | |------------|------|----------------------|------|--------------------|-----------|----------------------------------------------------|-----------------------------------|----------------------|
 7 | | CoQA       | RC   | free form (+no ans)  | 127k | various articles   | TACL?     | [url](https://stanfordnlp.github.io/coqa/)         | conversational questions          | QuAC                 |
 8 | | QuAC       | RC   | extraction (+no ans) | 100k | Wikipedia          | EMNLP2018 | [url](http://quac.ai/)                             | conversational questions          | CoQA                 |
 9 | | HotpotQA   | RC   | extraction           | 113k | Wikipedia          | EMNLP2018 | [url](http://hotpotqa.github.io/)                  | multi-hop reasoning               | QAngaroo             |
10 | | SWAG       | QA   | multiple choice      | 113k | video caption      | EMNLP2018 | [url](http://rowanzellers.com/swag/)               | situational commonsense reasoning |                      |
11 | | DNC        | NLI  | textual entailment   | 570k | NLP tasks          | EMNLP2018 | [url](http://decomp.io/)                           | diverse NLI                       | SNLI, MultiNLI       |
12 | | OpenBookQA | QA   | multiple choice      | 6k   | science facts      | EMNLP2018 | [url](http://data.allenai.org/OpenBookQA)          | external knowledge                | ARC                  |
13 | | RecipeQA   | RC+  | various              | 36k  | recipe             | EMNLP2018 | [url](https://hucvl.github.io/recipeqa/)           | multimodal comprehension          | TextbookQA, FigureQA |
14 | | CLOTH      | RC   | cloze                | 99k  | English exams      | EMNLP2018 | [url](http://www.cs.cmu.edu/~glai1/data/cloth/)    |                                   | RACE                 |
15 | | DuoRC      | RC   | extraction           | 186k | movie plot         | ACL2018   | [url](https://duorc.github.io/)                    |                                   | NarrativeQA          |
16 | | SQuAD2.0   | RC   | extraction (+no ans) | 150k | Wikipedia          | ACL2018   | [url](https://rajpurkar.github.io/SQuAD-explorer/) | no answer: 50k                    | NewsQA               |
17 | | CliCR      | RC   | cloze                | 100k | clinical case text | NAACL2018 | [url](https://github.com/clips/clicr)              |                                   |                      |
18 | | FEVER      | NLI? | fact verification    | 185k | Wikipedia          | NAACL2018 | [url](http://fever.ai/)                            |                                   |                      |
19 | | MultiRC    | RC   | multiple choice      | 6k+  | various articles   | NAACL2018 | [url](http://cogcomp.org/multirc/)                 | multiple sentence reasoning       | MCTest               |
20 | | ProPara    | RC   | various              | 2k   | procedural text    | NAACL2018 | [url](https://github.com/allenai/propara)          |                                   | bAbI, SCoNE          |
21 | | ARC        | RC   | multiple choice      | 8k   | science exam       | ?         | [url](http://data.allenai.org/arc/)                | easy 5197, challenge 2590         |                      |
22 | 
23 | TODO:
24 | * [Interpretation of Natural Language Rules in Conversational Machine Reading](https://arxiv.org/abs/1809.01494) (Saeidi+ 2018, EMNLP)
25 | * [Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds](http://aclweb.org/anthology/P18-1077) (Labutov+ 2018, ACL)
26 | * [Event2Mind: Commonsense Inference on Events, Intents, and Reactions](http://aclweb.org/anthology/P18-1043) (Rashkin+ 2018, ACL)
27 | * [Modeling Naive Psychology of Characters in Simple Commonsense Stories](http://aclweb.org/anthology/P18-1213) (Rashkin+ 2018, ACL)
28 | * [emrQA: A Large Corpus for Question Answering on Electronic Medical Records](http://aclweb.org/anthology/D18-1258) (Pampari+ 2018, EMNLP)
29 | 
30 | Note:
31 | * QA = question answering, RC = reading comprehension = question answering with the context, NLI = natural language inference aka recognizing textual entailment
32 | 


--------------------------------------------------------------------------------