└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Natural Perturbations for BoolQ 
 2 | This repository contains the resources relevant to the following publication:
 3 | 
 4 | ```bibtex 
 5 | @article{khashabi2020naturalperturbations,
 6 |   title={Natural Perturbation for Robust Question Answering},
 7 |   author={D. Khashabi and T. Khot and A. Sabhwaral},
 8 |   journal={arXiv preprint},
 9 |   year={2020}
10 | }
11 | ```
12 | 
13 | The data here BoolQ-NQ is an extension to BoolQ, a dataset that was originally released in [Clark et al, 2019](https://github.com/google-research-datasets/boolean-questions).
14 | 
15 | 
16 | ## How does the data look like? 
17 | 
18 | The dataset is organzied as a `.jsonl` file, i.e. each line is a json. Here is an example: 
19 | ```json
20 | {"cluster-id": "25938", "question_id": "267", "is_seed_question": 0, "split": "train", "passage": "(Thanksgiving (United States)) Thanksgiving, or Thanksgiving Day, is a public holiday celebrated on the fourth Thursday of November in the United States. It originated as a harvest festival. Thanksgiving has been celebrated nationally on and off since 1789, after Congress requested a proclamation by George Washington. It has been celebrated as a federal holiday every year since 1863, when, during the American Civil War, President Abraham Lincoln proclaimed a national day of ``Thanksgiving and Praise to our beneficent Father who dwelleth in the Heavens,'' to be celebrated on the last Thursday in November. Together with Christmas and the New Year, Thanksgiving is a part of the broader fall/winter holiday season in the U.S.", "question": "is thanksgiving sometimes the last thursday of the month?", "hard_label": "True", "soft_label": 0.75, "roberta_hard": true, "ind_human_label": "?"}
21 | ```
22 | 
23 | 
24 | Here the keys are: 
25 |  - The question and its relevant passgae are defined with the `question` and `passage` fields. 
26 |  - The gold yes/no gold answer is defined with the `hard_label` field. I have also included a "soft" label in `soft_label` in you wanna know how many have agreed on this label. 
27 |  - The dataset split is indicated with `split`. 
28 |  - `roberta_hard` indicates whether it was a difficult question for a RoBERTa trained on BoolQ. 
29 |  - `ind_human_label` additional human label elicited for the instances included in the `test` split. 
30 |  - `is_seed_question` indicates whether this question is borrowed from the original BoolQ dataset. 
31 | - `cluster-id`: one thing to notice about the data is that it comes as clusters of questions. The questions that belong to the same cluster share the same `cluster-id`. 
32 | 
33 | ## How big is the data? 
34 | 
35 | Here are some statistics on the data:
36 | 
37 | | Measure                 | Full  | Train | Dev  | Test |
38 | |-------------------------|-------|-------|------|------|
39 | | \# of questions         | 17323 | 9727  | 4434 | 3162 |
40 | | \# of ``yes'' questions | 9724  | 5733  | 2263 | 1728 |
41 | | \# of ``no'' questions  | 7599  | 3994  | 2171 | 1434 |
42 | | \# of clusters          | 4064  | 2408  | 919  | 737  |
43 | | average cluster size    | 4.3   | 4.1   | 4.8  | 4.3  |
44 | | median cluster size     | 3.0   | 3.0   | 3.0  | 3.0  |
45 | 
46 | ## How does the annotation interface look like? 
47 | 
48 | [Here](https://www.youtube.com/watch?v=MWbCRwanbOA&feature=youtu.be) is not-so-polished screencast of the annotation interface. 
49 | 


--------------------------------------------------------------------------------