├── images ├── vq2a_examples.png └── gif_vq2a_approach.gif ├── LICENSE └── README.md /images/vq2a_examples.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google-research-datasets/maverics/HEAD/images/vq2a_examples.png -------------------------------------------------------------------------------- /images/gif_vq2a_approach.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/google-research-datasets/maverics/HEAD/images/gif_vq2a_approach.gif -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The dataset may be freely used for any purpose, although acknowledgement of 2 | Google LLC ("Google") as the data source would be appreciated. The dataset is 3 | provided "AS IS" without any warranty, express or implied. Google disclaims all 4 | liability for any damages, direct or indirect, resulting from the use of the 5 | dataset. 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MAVERICS 2 | 3 |

4 | 5 |

6 | 7 | We introduce Manually vAlidated Vq2a Examples fRom Image/Caption datasetS (MAVERICS), a suite of test-only visual question answering datasets. 8 | The datasets are created from image captions by Visual Question Generation with Question Answering validation, or VQ^2A (see the figure below), followed by manual verification. 9 | Check our [paper](https://arxiv.org/abs/2205.01883) for further details. 10 | 11 |

12 | 13 |

14 | 15 | ## Download 16 | 17 | [COCO minival2014](https://storage.googleapis.com/maverics/maverics_coco.json) (193KB), generated from [COCO Captions](https://cocodataset.org/#captions-2015). 18 | 19 | [CC3M dev](https://storage.googleapis.com/maverics/maverics_cc3m.json) (177KB), generated from [Conceptual Captions](https://github.com/google-research-datasets/conceptual-captions). 20 | 21 | **Format (.json)** 22 |
23 | dataset               str: dataset name
24 | split                 str: dataset split
25 | annotations           List of image-question-answers triplets, each of which is
26 | -- image_id           str: image ID
27 | -- caption            str: image caption
28 | -- qa_pairs           List of question-answer pairs, each of which is
29 | ---- question_id      str: question ID
30 | ---- raw_question     str: raw question
31 | ---- question         str: processed question
32 | ---- answers          List of str: 10 ground-truth answers
33 | 
34 | 35 | ## Cite 36 | 37 | If you use this dataset in your research, please cite the original image caption datasets and our paper: 38 | 39 | Soravit Changpinyo*, Doron Kukliansky*, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut. 40 | [All You May Need for VQA are Image Captions](https://arxiv.org/abs/2205.01883). 41 | NAACL 2022. 42 | 43 |
44 | @inproceedings{changpinyo2022vq2a,
45 |   title = {All You May Need for VQA are Image Captions},
46 |   author = {Changpinyo, Soravit and Kukliansky, Doron and Szpektor, Idan and Chen, Xi and Ding, Nan and Soricut, Radu},
47 |   booktitle = {NAACL},
48 |   year = {2022},
49 | }
50 | 
51 | 52 | ## Related Datasets 53 | A multilingual extension of this approach and its accompanied dataset MaXM can be found on [this page](https://github.com/google-research-datasets/maxm/). 54 | 55 | ## Contact Us 56 | 57 | Please create an issue in this repository. If you would like to share feedback or report concerns, please email schangpi@google.com. 58 | --------------------------------------------------------------------------------