└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Natural Perturbations for BoolQ 2 | This repository contains the resources relevant to the following publication: 3 | 4 | ```bibtex 5 | @article{khashabi2020naturalperturbations, 6 | title={Natural Perturbation for Robust Question Answering}, 7 | author={D. Khashabi and T. Khot and A. Sabhwaral}, 8 | journal={arXiv preprint}, 9 | year={2020} 10 | } 11 | ``` 12 | 13 | The data here BoolQ-NQ is an extension to BoolQ, a dataset that was originally released in [Clark et al, 2019](https://github.com/google-research-datasets/boolean-questions). 14 | 15 | 16 | ## How does the data look like? 17 | 18 | The dataset is organzied as a `.jsonl` file, i.e. each line is a json. Here is an example: 19 | ```json 20 | {"cluster-id": "25938", "question_id": "267", "is_seed_question": 0, "split": "train", "passage": "(Thanksgiving (United States)) Thanksgiving, or Thanksgiving Day, is a public holiday celebrated on the fourth Thursday of November in the United States. It originated as a harvest festival. Thanksgiving has been celebrated nationally on and off since 1789, after Congress requested a proclamation by George Washington. It has been celebrated as a federal holiday every year since 1863, when, during the American Civil War, President Abraham Lincoln proclaimed a national day of ``Thanksgiving and Praise to our beneficent Father who dwelleth in the Heavens,'' to be celebrated on the last Thursday in November. Together with Christmas and the New Year, Thanksgiving is a part of the broader fall/winter holiday season in the U.S.", "question": "is thanksgiving sometimes the last thursday of the month?", "hard_label": "True", "soft_label": 0.75, "roberta_hard": true, "ind_human_label": "?"} 21 | ``` 22 | 23 | 24 | Here the keys are: 25 | - The question and its relevant passgae are defined with the `question` and `passage` fields. 26 | - The gold yes/no gold answer is defined with the `hard_label` field. I have also included a "soft" label in `soft_label` in you wanna know how many have agreed on this label. 27 | - The dataset split is indicated with `split`. 28 | - `roberta_hard` indicates whether it was a difficult question for a RoBERTa trained on BoolQ. 29 | - `ind_human_label` additional human label elicited for the instances included in the `test` split. 30 | - `is_seed_question` indicates whether this question is borrowed from the original BoolQ dataset. 31 | - `cluster-id`: one thing to notice about the data is that it comes as clusters of questions. The questions that belong to the same cluster share the same `cluster-id`. 32 | 33 | ## How big is the data? 34 | 35 | Here are some statistics on the data: 36 | 37 | | Measure | Full | Train | Dev | Test | 38 | |-------------------------|-------|-------|------|------| 39 | | \# of questions | 17323 | 9727 | 4434 | 3162 | 40 | | \# of ``yes'' questions | 9724 | 5733 | 2263 | 1728 | 41 | | \# of ``no'' questions | 7599 | 3994 | 2171 | 1434 | 42 | | \# of clusters | 4064 | 2408 | 919 | 737 | 43 | | average cluster size | 4.3 | 4.1 | 4.8 | 4.3 | 44 | | median cluster size | 3.0 | 3.0 | 3.0 | 3.0 | 45 | 46 | ## How does the annotation interface look like? 47 | 48 | [Here](https://www.youtube.com/watch?v=MWbCRwanbOA&feature=youtu.be) is not-so-polished screencast of the annotation interface. 49 | --------------------------------------------------------------------------------