├── .gitignore ├── README.rst └── push /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode 2 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | 2 | VQA-CP Leaderboard 3 | =================== 4 | 5 | A collections of papers about the VQA-CP dataset and a benchmark / leaderboard of their results. 6 | VQA-CP_ is an out-of-distribution dataset for Visual Question Answering, 7 | which is designed to penalize models that rely on question biases to give an answer. 8 | You can download VQA-CP annotations here : https://computing.ece.vt.edu/~aish/vqacp/ 9 | 10 | Notes: 11 | 12 | - All reported papers do not use the same baseline architectures, 13 | so the scores might not be directly comparable. This leaderboard 14 | is only made as a reference of all bias-reduction methods that 15 | were tested on VQA-CP. 16 | 17 | - We mention the presence or absence of a validation set, because 18 | for out-of-distribution datasets, it is very important to find hyperparameters 19 | and do early-stopping on a validation set that has the same distribution as 20 | the training set. Otherwise, there is a risk of overfitting the testing set 21 | and its biases, which defeats the point of the VQA-CP dataset. This is why we 22 | **highly recommand** for future work that they build a **validation set** 23 | from a part of training set. 24 | 25 | 26 | You can read an overview of some of those bias-reduction methods here: https://cdancette.fr/2020/11/21/overview-bias-reductions-vqa/ 27 | 28 | 29 | VQA-CP v2 30 | *********** 31 | 32 | In bold are highlighted best results on architectures without pre-training. 33 | 34 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 35 | | Name | Base Arch. | Conference | All | Yes/No | Numbers | Other | Validation | 36 | +=================+======================+=========================+===========+============+============+============+============+ 37 | | AttReg_ [2]_ | LMH_ | Preprint | 59.92 | 87.28 | 52.39 | 47.65 | | 38 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 39 | | GGE-DQ_ | UpDown | ICCV 2021 | 57.32 | 87.04 | 27.75 | 49.59 | | 40 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 41 | | AdaVQA_ | UpDown | IJCAI 2021 | 54.67 | 72.47 | 53.81 | 45.58 | No Valset | 42 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 43 | | DecompLR_ | UpDown | AAAI 2020 | 48.87 | 70.99 | 18.72 | 45.57 | No Valset | 44 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 45 | | MUTANT_ | LXMERT | EMNLP 2020 | 69.52 | 93.15 | 67.17 | 57.78 | No valset | 46 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 47 | | MUTANT_ | UpDown | EMNLP 2020 | **61.72** | **88.90** | **49.68** | **50.78** | No valset | 48 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 49 | | CL_ | UpDown + LMH_ + CSS_ | EMNLP 2020 | 59.18 | 86.99 | 49.89 | 47.16 | No valset | 50 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 51 | | RMFE_ | UpDown + LMH_ | NeurIPS 2020 | 54.55 | 74.03 | 49.16 | 45.82 | No Valset | 52 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 53 | | RandImg_ | UpDown | NeurIPS 2020 | 55.37 | 83.89 | 41.60 | 44.20 | Valset | 54 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 55 | | Loss-Rescaling_ | UpDown + LMH_ | Preprint 2020 | 53.26 | 72.82 | 48.00 | 44.46 | | 56 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 57 | | ESR_ | UpDown | ACL 2020 | 48.9 | 69.8 | 11.3 | 47.8 | | 58 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 59 | | GradSup_ | Unshuffling_ | ECCV 2020 | 46.8 | 64.5 | 15.3 | 45.9 | **Valset** | 60 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 61 | | VGQE_ | S-MRL | ECCV 2020 | 50.11 | 66.35 | 27.08 | 46.77 | No valset | 62 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 63 | | CSS_ | UpDown + LMH_ | CVPR 2020 | 58.95 | 84.37 | 49.42 | 48.21 | No valset | 64 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 65 | | Semantic_ | UpDn + RUBi_ | Preprint 2020 | 47.5 | | | | | 66 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 67 | | Unshuffling_ | UpDown | Preprint 2020 | 42.39 | 47.72 | 14.43 | 47.24 | **Valset** | 68 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 69 | | CF-VQA_ | UpDown + LMH_ | Preprint 2020 | 57.18 | 80.18 | 45.62 | 48.31 | No valset | 70 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 71 | | LMH_ | UpDown | EMNLP 2019 | 52.05 | 69.81 [1]_ | 44.46 [1]_ | 45.54 [1]_ | No valset | 72 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 73 | | RUBi_ | S-MRL [3]_ | NeurIPS 2019 | 47.11 | 68.65 | 20.28 | 43.18 | No valset | 74 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 75 | | SCR_ [2]_ | UpDown | NeurIPS 2019 | 49.45 | 72.36 | 10.93 | 48.02 | No valset | 76 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 77 | | NSM_ | | NeurIPS 2019 | 45.80 | | | | | 78 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 79 | | HINT_ [2]_ | UpDown | ICCV 2019 | 46.73 | 67.27 | 10.61 | 45.88 | No valset | 80 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 81 | | ActSeek_ | UpDown | CVPR 2019 | 46.00 | 58.24 | 29.49 | 44.33 | **ValSet** | 82 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 83 | | GRL_ | UpDown | NAACL-HLT 2019 Workshop | 42.33 | 59.74 | 14.78 | 40.76 | **Valset** | 84 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 85 | | AdvReg_ | UpDown | NeurIPS 2018 | 41.17 | 65.49 | 15.48 | 35.48 | No Valset | 86 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 87 | | GVQA_ | | CVPR 2018 | 31.30 | 57.99 | 13.68 | 22.14 | No valset | 88 | +-----------------+----------------------+-------------------------+-----------+------------+------------+------------+------------+ 89 | 90 | .. [1] Retrained by CSS_ 91 | .. [2] Using additional information 92 | .. [3] S-MRL stands for Simplified-MUREL. The architecture was proposed in RUBi_. 93 | 94 | .. VQA-CP v1 95 | .. ********* 96 | 97 | Papers 98 | ****** 99 | 100 | .. .. |br| raw:: html 101 | 102 | ..
103 | 104 | _`GGE-DQ` 105 | | Greedy Gradient Ensemble for Robust Visual Question Answering 106 | | Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian 107 | | https://arxiv.org/pdf/2107.12651.pdf 108 | _`DecompLR` 109 | | Overcoming language priors in vqa via decomposed linguistic representations 110 | | Chenchen Jing, Yuwei Wu, Xiaoxun Zhang, Yunde Jia, Qi Wu 111 | | https://ojs.aaai.org/index.php/AAAI/article/view/6776 112 | _`AdaVQA` 113 | | AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss 114 | | Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto Del Bimbo 115 | | https://arxiv.org/pdf/2105.01993.pdf 116 | 117 | _`MUTANT` 118 | | MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering - **EMNLP 2020** 119 | | Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang 120 | | https://www.aclweb.org/anthology/2020.emnlp-main.63/ 121 | | code: https://github.com/tejas-gokhale/vqa_mutant 122 | _`CL` 123 | | Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering - **EMNLP 2020** 124 | | Zujie Liang, Weitao Jiang, Haifeng Hu, Jiaying Zhu 125 | | https://www.aclweb.org/anthology/2020.emnlp-main.265.pdf 126 | _`RMFE` 127 | | Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies - **NeurIPS 2020** 128 | | Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan 129 | | https://proceedings.neurips.cc/paper/2020/hash/20d749bc05f47d2bd3026ce457dcfd8e-Abstract.html 130 | | code: https://github.com/itaigat/removing-bias-in-multi-modal-classifiers 131 | _`RandImg` 132 | | On the Value of Out-of-Distribution Testing:An Example of Goodhart’s Law 133 | | Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel 134 | | https://arxiv.org/abs/2005.09241 135 | _`Loss-Rescaling` 136 | | Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View - **Preprint 2020** 137 | | Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian 138 | | https://arxiv.org/abs/2010.16010 139 | _`ESR` (Embarrassingly Simple Regularizer) 140 | | A Negative Case Analysis of Visual Grounding Methods for VQA - **ACL 2020** 141 | | Robik Shrestha, Kushal Kafle, Christopher Kanan 142 | | https://www.aclweb.org/anthology/2020.acl-main.727.pdf 143 | _`GradSup` 144 | | Learning what makes a difference from counterfactual examples and gradient supervision - **ECCV 2020** 145 | | Damien Teney, Ehsan Abbasnedjad, Anton van den Hengel 146 | | https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123550579.pdf 147 | _`VGQE` 148 | | Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder - **ECCV 2020** 149 | | Gouthaman KV, Anurag Mittal 150 | | https://arxiv.org/abs/2007.06198 151 | _`CSS` 152 | | Counterfactual Samples Synthesizing for Robust Visual Question Answering - **CVPR 2020** 153 | | Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, Yueting Zhuang 154 | | https://arxiv.org/abs/2003.06576 155 | | code: https://github.com/yanxinzju/CSS-VQA 156 | _`Semantic` 157 | | Estimating semantic structure for the VQA answer space - **Preprint 2020** 158 | | Corentin Kervadec, Grigory Antipov, Moez Baccouche, Christian Wolf 159 | | https://arxiv.org/abs/2006.05726 160 | _`Unshuffling` 161 | | Unshuffling Data for Improved Generalization - **Preprint 2020** 162 | | Damien Teney, Ehsan Abbasnejad, Anton van den Hengel 163 | | https://arxiv.org/abs/2002.11894 164 | 165 | .. raw:: html 166 | 167 |
Summary 168 | 169 | Inspired by Invariant Risk Minimization (Arjovskyet al.). 170 | They make use of two training sets with different 171 | biases to learn a more robust classifier (that will perform 172 | better on OOD data). 173 | 174 |
175 | 176 | _`CF-VQA` 177 | | Counterfactual VQA: A Cause-Effect Look at Language Bias - **Preprint 2020** 178 | | Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen 179 | | https://arxiv.org/abs/2006.04315v2 180 | 181 | .. raw:: html 182 | 183 |
Summary 184 | 185 | They formalize the ensembling framwork from RUBi_ and LMH_ using 186 | the causality framework. 187 | 188 | .. raw:: html 189 | 190 |
191 | 192 | _`LMH` 193 | | Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases - **EMNLP 2019** 194 | | Christopher Clark, Mark Yatskar, Luke Zettlemoyer 195 | | https://arxiv.org/abs/1909.03683 196 | | code: https://github.com/chrisc36/bottom-up-attention-vqa 197 | _`RUBi` 198 | | RUBi: Reducing Unimodal Biases in Visual Question Answering - **NeurIPS 2019** 199 | | Remi Cadene, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, Devi Parikh 200 | | https://arxiv.org/abs/1906.10169 201 | 202 | .. raw:: html 203 | 204 |
Summary 205 |

During training : Ensembling with a question-only model that will learn the biases, and let the main VQA model learn 206 | useful behaviours.

207 | 208 |

During testing: We remove the question-only model, and keep only the VQA model.

209 | 210 |
211 | 212 | | code: https://github.com/cdancette/rubi.bootstrap.pytorch 213 | 214 | _`NSM` 215 | | Learning by Abstraction: The Neural State Machine 216 | | Drew A. Hudson, Christopher D. Manning 217 | | https://arxiv.org/abs/1907.03950 218 | 219 | 220 | 221 | _`SCR` 222 | | Self-Critical Reasoning for Robust Visual Question Answering - **NeurIPS 2019** 223 | | Jialin Wu, Raymond J. Mooney 224 | | https://arxiv.org/abs/1905.09998 225 | | code: https://github.com/jialinwu17/self_critical_vqa 226 | _`HINT` 227 | | Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded - **ICCV 2019** 228 | | Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh 229 | | https://arxiv.org/abs/1902.03751 230 | _`ActSeek` 231 | | Actively Seeking and Learning from Live Data - **CVPR 2019** 232 | | Damien Teney, Anton van den Hengel 233 | | https://arxiv.org/abs/1904.02865 234 | _`GRL` 235 | | Adversarial Regularization for Visual Question Answering:Strengths, Shortcomings, and Side Effects - **NAACL HLT - Workshop on Shortcomings in Vision and Language (SiVL) ** 236 | | Gabriel Grand, Yonatan Belinkov 237 | | https://arxiv.org/pdf/1906.08430.pdf 238 | | code: https://github.com/gabegrand/adversarial-vqa 239 | _`AdvReg` 240 | | Overcoming Language Priors in Visual Question Answering with Adversarial Regularization - **NeurIPS 2018** 241 | | Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee 242 | | https://papers.nips.cc/paper/7427-overcoming-language-priors-in-visual-question-answering-with-adversarial-regularization.pdf 243 | | code: 244 | _`GVQA` 245 | | Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering - **CVPR 2018** 246 | | Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi 247 | | https://arxiv.org/abs/1712.00377 248 | | code: https://github.com/AishwaryaAgrawal/GVQA 249 | 250 | 251 | 252 | .. _VQA-CP: https://arxiv.org/abs/1712.00377 253 | -------------------------------------------------------------------------------- /push: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | git add README.rst 3 | git commit -m "readme" 4 | git push origin master --------------------------------------------------------------------------------