├── README.md
└── img
└── VQA.jpg
/README.md:
--------------------------------------------------------------------------------
1 | # Awesome-VQA
2 | A reading list of papers about Visual Question Answering.
3 |
4 | 
5 |
6 |
7 |
8 | ## Table of Contents
9 | * [Image QA Papers](#image-qa-papers)
10 | * [Datasets](#datasets)
11 | * [2021 Papers](#2021-papers)
12 | * [2020 Papers](#2020-papers)
13 | * [2019 Papers](#2019-papers)
14 | * [2018 Papers](#2018-papers)
15 | * [2017 Papers](#2017-papers)
16 | * [2016 Papers](#2016-papers)
17 | * [2015 Papers](#2015-papers)
18 | * [2014 Papers](#2014-papers)
19 |
20 | * [Embodied QA Papers](#embodied-qa-papers)
21 | * [Knowledge-based VQA Papers](#knowledge-based-vqa-papers)
22 | * [Fact-based VQA Papers](#fact-based-vqa-papers)
23 | * [Open-domain Knowledge-based VQA Papers](#open-domain-knowledge-based-vqa-papers)
24 | * [Interative QA Papers](#interactive-qa-papers)
25 | * [Image-Set VQA Papers](#image-set-vqa-papers)
26 | * [Inverse VQA Papers](#inverse-vqa-papers)
27 | * [Text-based VQA Papers](#text-based-vqa-papers)
28 | * [Data Visualization QA Papers](#data-visualization-qa-papers)
29 | * [Textbook QA Papers](#textbook-qa-papers)
30 | * [TextVQA Papers](#textvqa-papers)
31 | * [Visual Reasoning Papers](#visual-reasoning-papers)
32 |
33 | * [Video QA Papers](#video-qa-papers)
34 | * [Datasets](#datasets-1)
35 | * [2021 Papers](#2021-papers-1)
36 | * [2020 Papers](#2020-papers-1)
37 | * [2019 Papers](#2019-papers-1)
38 | * [2018 Papers](#2018-papers-1)
39 | * [2017 Papers](#2017-papers-1)
40 | * [2016 Papers](#2016-papers-1)
41 | * [2015 Papers](#2015-papers-1)
42 | * [2014 Papers](#2014-papers-1)
43 |
44 |
45 |
46 | ## Image QA Papers
47 | ### Datasets
48 | 1. **GQA** [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)][[dataset](https://cs.stanford.edu/people/dorarad/gqa/evaluate.html)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)]
49 | 2. **VQA-CP** [2018][CVPR] Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Agrawal_Dont_Just_Assume_CVPR_2018_paper.pdf)][[dataset](https://www.cc.gatech.edu/grads/a/aagrawal307/vqa-cp/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)]
50 | 3. **VQA v2.0** [2017][CVPR] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering.[[paper](https://arxiv.org/pdf/1612.00837.pdf)][[dataset](https://visualqa.org/)]
51 | 4. **Visual7W** [2016][CVPR] Visual7W: Grounded Question Answering in Images.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhu_Visual7W_Grounded_Question_CVPR_2016_paper.pdf)][[dataset](http://ai.stanford.edu/~yukez/visual7w/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)]
52 | 5. **SHAPES** [2016][CVPR] Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Andreas_Neural_Module_Networks_CVPR_2016_paper.pdf)][[dataset](https://github.com/jacobandreas/nmn2)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)]
53 | 6. **FM-IQA** [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[[paper](https://arxiv.org/pdf/1505.05612)][[dataset](http://research.baidu.com/Downloads)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
54 | 7. **VQA v1.0** [2015][ICCV] VQA, Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf)][[dataset](https://visualqa.org/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
55 | 8. **Visual Madlibs** [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Yu_Visual_Madlibs_Fill_ICCV_2015_paper.pdf)][[dataset](http://tamaraberg.com/visualmadlibs/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
56 | 9. **DAQUAR-Consensus** [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Malinowski_Ask_Your_Neurons_ICCV_2015_paper.pdf)][[dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
57 | 10. **DAQUAR** [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[[paper](https://arxiv.org/pdf/1410.0210)][[dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)]
58 |
59 | ### 2021 Papers
60 | 1. [2021][AAAI] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering.[[paper](https://arxiv.org/pdf/2009.10054)][[中文解读](https://blog.csdn.net/ms961516792/article/details/114974963?spm=1001.2014.3001.5501)]
61 | 2. [2021][CVPR] Causal Attention for Vision-Language Tasks.[[paper](https://arxiv.org/pdf/2103.03493)][[中文解读](https://blog.csdn.net/ms961516792/article/details/116641619?spm=1001.2014.3001.5501)]
62 | 3. [2021][CVPR] Counterfactual VQA: A Cause-Effect Look at Language Bias.[[paper](https://arxiv.org/pdf/2006.04315)]
63 | 4. [2021][CVPR] Domain-robust VQA with diverse datasets and methods but no target labels.[[paper](https://arxiv.org/pdf/2103.15974)][[中文解读](https://blog.csdn.net/ms961516792/article/details/118001142)]
64 | 5. [2021][CVPR] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules.[[paper](https://arxiv.org/pdf/2105.04836)]
65 | 6. [2021][CVPR] Separating Skills and Concepts for Novel Visual Question Answering.[[paper](https://blender.cs.illinois.edu/paper/cvpr2021.pdf)]
66 | 7. [2021][CVPR] Transformation Driven Visual Reasoning.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hong_Transformation_Driven_Visual_Reasoning_CVPR_2021_paper.pdf)]
67 | 8. [2021][CVPR] Predicting Human Scanpaths in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Predicting_Human_Scanpaths_in_Visual_Question_Answering_CVPR_2021_paper.pdf)]
68 | 9. [2021][CVPR] Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Perception_Matters_Detecting_Perception_Failures_of_VQA_Models_Using_Metamorphic_CVPR_2021_paper.pdf)]
69 | 10. [2021][CVPR] Roses are Red, Violets are Blue... But Should VQA expect Them To?.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Kervadec_Roses_Are_Red_Violets_Are_Blue..._but_Should_VQA_Expect_CVPR_2021_paper.pdf)]
70 | 11. [2021][CVPR] How Transferable are Reasoning Patterns in VQA?.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Kervadec_How_Transferable_Are_Reasoning_Patterns_in_VQA_CVPR_2021_paper.pdf)]
71 | 12. [2021][CVPR] KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Marino_KRISP_Integrating_Implicit_and_Symbolic_Knowledge_for_Open-Domain_Knowledge-Based_VQA_CVPR_2021_paper.pdf)]
72 | 13. [2021][CVPR] TAP: Text-Aware Pre-training for Text-VQA and Text-Caption.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_TAP_Text-Aware_Pre-Training_for_Text-VQA_and_Text-Caption_CVPR_2021_paper.pdf)]
73 | 14. [2021][ICCV] Greedy Gradient Ensemble for Robust Visual Question Answering.[[paper](https://arxiv.org/pdf/2107.12651.pdf)]
74 | 15. [2021][ICCV] Auto-Parsing Network for Image Captioning and Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Auto-Parsing_Network_for_Image_Captioning_and_Visual_Question_Answering_ICCV_2021_paper.pdf)]
75 | 16. [2021][ICCV] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Dancette_Beyond_Question-Based_Biases_Assessing_Multimodal_Shortcut_Learning_in_Visual_Question_ICCV_2021_paper.pdf)]
76 | 17. [2021][ICCV] Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Cao_Linguistically_Routing_Capsule_Network_for_Out-of-Distribution_Visual_Question_Answering_ICCV_2021_paper.pdf)]
77 | 18. [2021][ICCV] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Banerjee_Weakly_Supervised_Relative_Spatial_Reasoning_for_Visual_Question_Answering_ICCV_2021_paper.pdf)]
78 | 19. [2021][ICCV] Unshuffling Data for Improved Generalization in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Teney_Unshuffling_Data_for_Improved_Generalization_in_Visual_Question_Answering_ICCV_2021_paper.pdf)]
79 | 20. [2021][TIP] Re-Attention for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/5338/5194)]
80 | 21. [2021][SIGIR] LPF, A Language-Prior Feedback Objective Function for De-biased Visual Question Answering.[[paper](https://arxiv.org/pdf/2105.14300)]
81 | 22. [2021][ICCV] .[[paper]()]
82 |
83 |
84 | ### 2020 Papers
85 | 1. [2020][arXiv] KRISP Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[[paper](https://arxiv.org/pdf/2012.11014)]
86 | 2. [2020][AAAI] Overcoming Language Priors in VQA via Decomposed Linguistic Representations.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/6776/6630)]
87 | 3. [2020][ACL] Cross-Modality Relevance for Reasoning on Language and Vision.[[paper](https://arxiv.org/pdf/2005.06035)][[中文解读](https://blog.csdn.net/ms961516792/article/details/108254786)]
88 | 4. [2020][CVPR] Counterfactual Samples Synthesizing for Robust Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Counterfactual_Samples_Synthesizing_for_Robust_Visual_Question_Answering_CVPR_2020_paper.pdf)]
89 | 5. [2020][CVPR] Counterfactual Vision and Language Learning.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Abbasnejad_Counterfactual_Vision_and_Language_Learning_CVPR_2020_paper.pdf)]
90 | 6. [2020][CVPR] Fantastic Answers and Where to Find Them Immersive Question-Directed Visual Attention.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Jiang_Fantastic_Answers_and_Where_to_Find_Them_Immersive_Question-Directed_Visual_CVPR_2020_paper.pdf)]
91 | 7. [2020][CVPR] Hypergraph Attention Networks for Multimodal Learning.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Hypergraph_Attention_Networks_for_Multimodal_Learning_CVPR_2020_paper.pdf)]
92 | 8. [2020][CVPR] In Defense of Grid Features for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Jiang_In_Defense_of_Grid_Features_for_Visual_Question_Answering_CVPR_2020_paper.pdf)]
93 | 9. [2020][CVPR]Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Gao_Multi-Modal_Graph_Neural_Network_for_Joint_Reasoning_on_Vision_and_CVPR_2020_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)]
94 | 10. [2020][CVPR] On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_On_the_General_Value_of_Evidence_and_Bilingual_Scene-Text_Visual_CVPR_2020_paper.pdf)]
95 | 11. [2020][CVPR] SQuINTing at VQA Models, Introspecting VQA Models With Sub-Questions.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Selvaraju_SQuINTing_at_VQA_Models_Introspecting_VQA_Models_With_Sub-Questions_CVPR_2020_paper.pdf)]
96 | 12. [2020][CVPR] TA-Student VQA, Multi-Agents Training by Self-Questioning.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Xiong_TA-Student_VQA_Multi-Agents_Training_by_Self-Questioning_CVPR_2020_paper.pdf)]
97 | 13. [2020][CVPR] Towards Causal VQA Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Agarwal_Towards_Causal_VQA_Revealing_and_Reducing_Spurious_Correlations_by_Invariant_CVPR_2020_paper.pdf)]
98 | 14. [2020][CVPR] Visual Commonsense R-CNN.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_Visual_Commonsense_R-CNN_CVPR_2020_paper.pdf)]
99 | 15. [2020][CVPR] VQA with No Questions-Answers Training.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Vatashsky_VQA_With_No_Questions-Answers_Training_CVPR_2020_paper.pdf)]
100 | 16. [2020][ECCV][oral] A Competence-aware Curriculum for Visual Concepts Learning via Question Answering.[[paper](https://arxiv.org/pdf/2007.01499)]
101 | 17. [2020][ECCV][poster] Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123540528.pdf)]
102 | 18. [2020][ECCV][poster] Multi-Agent Embodied Question Answering in Interactive Environments.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580647.pdf)]
103 | 19. [2020][ECCV][poster] Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580018.pdf)]
104 | 20. [2020][ECCV][poster] Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123640426.pdf)]
105 | 21. [2020][ECCV][poster] TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660409.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/112299186)]
106 | 22. [2020][ECCV][poster] Visual Question Answering on Image Sets.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660052.pdf)]
107 | 23. [2020][ECCV][poster] VQA-LOL: Visual Question Answering under the Lens of Logic.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660375.pdf)]
108 | 24. [2020][EMNLP] MUTANT: A Training Paradigm for Out-of-Distribution Generalization in VQA.[[paper](https://arxiv.org/pdf/2009.08566.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/114951597)]
109 | 25. [2020][IJCAI] Mucko, Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering.[[paper](https://arxiv.org/pdf/2006.09073)]
110 | 26. [2020][NeurIPS] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering.[[paper](https://proceedings.neurips.cc/paper/2020/file/1fd6c4e41e2c6a6b092eb13ee72bce95-Paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113974139)]
111 | 27. [2020][TMM] Self-Adaptive Neural Module Transformer for Visual Question Answering.[[paper](https://ieeexplore.ieee.org/abstract/document/9095237)]
112 |
113 | ### 2019 Papers
114 | 1. [2019][AAAI] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/4818/4691)]
115 | 2. [2019][AAAI] Lattice CNNs for Matching Based Chinese Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4633/4511)]
116 | 3. [2019][AAAI] TallyQA Answering Complex Counting Questions.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4815/4688)]
117 | 4. [2019][ACMMM] Perceptual Visual Reasoning with Knowledge Propagation.[[paper](https://www.researchgate.net/profile/Xin_Wang274/publication/336710651_Perceptual_Visual_Reasoning_with_Knowledge_Propagation/links/5f4f9ebb299bf13a31978287/Perceptual-Visual-Reasoning-with-Knowledge-Propagation.pdf)]
118 | 5. [2019][CVPR] Cycle-Consistency for Robust Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Shah_Cycle-Consistency_for_Robust_Visual_Question_Answering_CVPR_2019_paper.pdf)]
119 | 6. [2019][CVPR] Embodied Question Answering in Photorealistic Environments with Point Cloud Perception.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Wijmans_Embodied_Question_Answering_in_Photorealistic_Environments_With_Point_Cloud_Perception_CVPR_2019_paper.pdf)]
120 | 7. [2019][CVPR] Explainable and Explicit Visual Reasoning over Scene Graphs.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Shi_Explainable_and_Explicit_Visual_Reasoning_Over_Scene_Graphs_CVPR_2019_paper.pdf)]
121 | 8. [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)]
122 | 9. [2019][CVPR] It’s not about the Journey; It’s about the Destination Following Soft Paths under Question-Guidance for Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Haurilet_Its_Not_About_the_Journey_Its_About_the_Destination_Following_CVPR_2019_paper.pdf)]
123 | 10. [2019][CVPR] MUREL, Multimodal Relational Reasoning for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Cadene_MUREL_Multimodal_Relational_Reasoning_for_Visual_Question_Answering_CVPR_2019_paper.pdf)]
124 | 11. [2019][CVPR] Towards VQA Models That Can Read.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Singh_Towards_VQA_Models_That_Can_Read_CVPR_2019_paper.pdf)][[中文解读]()]
125 | 12. [2019][CVPR] Transfer Learning via Unsupervised Task Discovery for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Noh_Transfer_Learning_via_Unsupervised_Task_Discovery_for_Visual_Question_Answering_CVPR_2019_paper.pdf)][[中文解读]()]
126 | 13. [2019][CVPR] Visual Question Answering as Reading Comprehension.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Visual_Question_Answering_as_Reading_Comprehension_CVPR_2019_paper.pdf)][[中文解读]()]
127 | 14. [2019][ICCV] Compact Trilinear Interaction for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Do_Compact_Trilinear_Interaction_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()]
128 | 15. [2019][ICCV] Language-Conditioned Graph Networks for Relational Reasoning.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Hu_Language-Conditioned_Graph_Networks_for_Relational_Reasoning_ICCV_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)]
129 | 16. [2019][ICCV] Multi-modality Latent Interaction Network for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Gao_Multi-Modality_Latent_Interaction_Network_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()]
130 | 17. [2019][ICCV] Relation-Aware Graph Attention Network for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Li_Relation-Aware_Graph_Attention_Network_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)]
131 | 18. [2019][ICCV] Scene Text Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Biten_Scene_Text_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()]
132 | 19. [2019][ICCV] Taking a HINT Leveraging Explanations to Make Vision and Language Models More Grounded.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Selvaraju_Taking_a_HINT_Leveraging_Explanations_to_Make_Vision_and_Language_ICCV_2019_paper.pdf)][[中文解读]()]
133 | 20. [2019][ICCV] U-CAM, Visual Explanation using Uncertainty based Class Activation Maps.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Patro_U-CAM_Visual_Explanation_Using_Uncertainty_Based_Class_Activation_Maps_ICCV_2019_paper.pdf)][[中文解读]()]
134 | 21. [2019][ICCV] Why Does a Visual Question Have Different Answers.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Bhattacharya_Why_Does_a_Visual_Question_Have_Different_Answers_ICCV_2019_paper.pdf)][[中文解读]()]
135 | 22. [2019][ICLR] THE NEURO-SYMBOLIC CONCEPT LEARNER: INTERPRETING SCENES, WORDS, AND SENTENCES FROM NATURAL SUPERVISION.[[paper](https://arxiv.org/pdf/1904.12584)][[中文解读]()]
136 | 23. [2019][ICLR] Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering.[[paper](https://arxiv.org/pdf/1901.00603)][[中文解读]()]
137 | 24. [2019][ICLR] Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering.[[paper](https://arxiv.org/pdf/1905.05733)][[中文解读]()]
138 | 25. [2019][ICLR] VISUAL REASONING BY PROGRESSIVE MODULE NETWORKS.[[paper](https://arxiv.org/pdf/1806.02453)][[中文解读]()]
139 | 26. [2019][ICML] Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering.[[paper](http://proceedings.mlr.press/v97/vedantam19a/vedantam19a.pdf)][[中文解读]()]
140 | 27. [2019][NeurIPS] Analyzing Compositionality of Visual Question Answering.[[paper]()](not found)[[中文解读]()]
141 | 28. [2019][NeurIPS] Heterogeneous Graph Learning for Visual Commonsense Reasoning.[[paper](https://arxiv.org/pdf/1910.11475)][[中文解读]()]
142 | 29. [2019][NeurIPS] Learning by Abstraction The Neural State Machine.[[paper](https://arxiv.org/pdf/1907.03950)][[中文解读]()]
143 | 30. [2019][NeurIPS] Learning Dynamics of Attention Human Prior for Interpretable Machine Reasoning.[[paper](https://arxiv.org/pdf/1905.11666)][[中文解读]()]
144 | 31. [2019][NeurIPS] RUBi: Reducing Unimodal Biases in Visual Question Answering.[[paper](https://arxiv.org/pdf/1906.10169)][[中文解读]()]
145 | 32. [2019][NeurIPS] Self-Critical Reasoning for Robust Visual Question Answering.[[paper](https://arxiv.org/pdf/1905.09998)][[中文解读]()]
146 | 33. [2019][NeurIPS] Visual Concept-Metaconcept Learning.[[paper](https://arxiv.org/pdf/2002.01464)][[中文解读]()]
147 |
148 | ### 2018 Papers
149 | 1. [2018][AAAI] Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12240/12099)][[中文解读]()]
150 | 2. [2018][AAAI] Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/11324/11183)][[中文解读]()]
151 | 3. [2018][AAAI] Exploring Human-like Attention Supervision in Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12272/12131)][[中文解读]()]
152 | 4. [2018][AAAI] Movie Question Answering Remembering the Textual Cues for Layered Visual Contents.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12253/12112)][[中文解读]()]
153 | 5. [2018][CVPR] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Anderson_Bottom-Up_and_Top-Down_CVPR_2018_paper.pdf)][[中文解读]()]
154 | 6. [2018][CVPR] Cross-Dataset Adaptation for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Cross-Dataset_Adaptation_for_CVPR_2018_paper.pdf)][[中文解读]()]
155 | 7. [2018][CVPR] Customized Image Narrative Generation via Interactive Visual Question Generation and Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shin_Customized_Image_Narrative_CVPR_2018_paper.pdf)][[中文解读]()]
156 | 8. [2018][CVPR] Differential Attention for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Patro_Differential_Attention_for_CVPR_2018_paper.pdf)][[中文解读]()]
157 | 9. [2018][CVPR] Don’t Just Assume; Look and Answer Overcoming Priors for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Agrawal_Dont_Just_Assume_CVPR_2018_paper.pdf)][[中文解读]()]
158 | 10. [2018][CVPR] DVQA Understanding Data Visualizations via Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Kafle_DVQA_Understanding_Data_CVPR_2018_paper.pdf)][[中文解读]()]
159 | 11. [2018][CVPR] Embodied Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Das_Embodied_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()]
160 | 12. [2018][CVPR] Focal Visual-Text Attention for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Liang_Focal_Visual-Text_Attention_CVPR_2018_paper.pdf)][[中文解读]()]
161 | 13. [2018][CVPR] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Improved_Fusion_of_CVPR_2018_paper.pdf)][[中文解读]()]
162 | 14. [2018][CVPR] IQA Visual Question Answering in Interactive Environments.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gordon_IQA_Visual_Question_CVPR_2018_paper.pdf)][[中文解读]()]
163 | 15. [2018][CVPR] iVQA Inverse Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_IVQA_Inverse_Visual_CVPR_2018_paper.pdf)][[中文解读]()]
164 | 16. [2018][CVPR] Learning Answer Embeddings for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Learning_Answer_Embeddings_CVPR_2018_paper.pdf)][[中文解读]()]
165 | 17. [2018][CVPR] Learning by Asking Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Misra_Learning_by_Asking_CVPR_2018_paper.pdf)][[中文解读]()]
166 | 18. [2018][CVPR] Learning Visual Knowledge Memory Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Su_Learning_Visual_Knowledge_CVPR_2018_paper.pdf)][[中文解读]()]
167 | 19. [2018][CVPR] Multimodal Explanations Justifying Decisions and Pointing to the Evidence.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Park_Multimodal_Explanations_Justifying_CVPR_2018_paper.pdf)][[中文解读]()]
168 | 20. [2018][CVPR] Textbook Question Answering under Instructor Guidance with Memory Networks.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Textbook_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()]
169 | 21. [2018][CVPR] Tips and Tricks for Visual Question Answering Learnings from the 2017 Challenge.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Teney_Tips_and_Tricks_CVPR_2018_paper.pdf)][[中文解读]()]
170 | 22. [2018][CVPR] Transparency by Design Closing the Gap Between Performance and Interpretability in Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Mascharka_Transparency_by_Design_CVPR_2018_paper.pdf)][[中文解读]()]
171 | 23. [2018][CVPR] Two can play this Game Visual Dialog with Discriminative Question Generation and Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Jain_Two_Can_Play_CVPR_2018_paper.pdf)][[中文解读]()]
172 | 24. [2018][CVPR] Visual Question Answering with Memory-Augmented Networks.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Ma_Visual_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()]
173 | 25. [2018][CVPR] Visual Question Generation as Dual Task of Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Visual_Question_Generation_CVPR_2018_paper.pdf)][[中文解读]()]
174 | 26. [2018][CVPR] Visual Question Reasoning on General Dependency Tree.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Cao_Visual_Question_Reasoning_CVPR_2018_paper.pdf)][[中文解读]()]
175 | 27. [2018][CVPR] VizWiz Grand Challenge Answering Visual Questions from Blind People.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gurari_VizWiz_Grand_Challenge_CVPR_2018_paper.pdf)][[中文解读]()]
176 | 28. [2018][ECCV] A Dataset and Architecture for Visual Reasoning with a Working Memory.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Guangyu_Robert_Yang_A_dataset_and_ECCV_2018_paper.pdf)][[中文解读]()]
177 | 29. [2018][ECCV] Deep Attention Neural Tensor Network for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yalong_Bai_Deep_Attention_Neural_ECCV_2018_paper.pdf)][[中文解读]()]
178 | 30. [2018][ECCV] Explainable Neural Computation via Stack Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ronghang_Hu_Explainable_Neural_Computation_ECCV_2018_paper.pdf)][[中文解读]()]
179 | 31. [2018][ECCV] Goal-Oriented Visual Question Generation via Intermediate Rewards.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Junjie_Zhang_Goal-Oriented_Visual_Question_ECCV_2018_paper.pdf)][[中文解读]()]
180 | 32. [2018][ECCV] Grounding Visual Explanations.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Lisa_Anne_Hendricks_Grounding_Visual_Explanations_ECCV_2018_paper.pdf)][[中文解读]()]
181 | 33. [2018][ECCV] Learning Visual Question Answering by Bootstrapping Hard Attention.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Mateusz_Malinowski_Learning_Visual_Question_ECCV_2018_paper.pdf)][[中文解读]()]
182 | 34. [2018][ECCV] Question Type Guided Attention in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yang_Shi_Question_Type_Guided_ECCV_2018_paper.pdf)][[中文解读]()]
183 | 35. [2018][ECCV] Question-Guided Hybrid Convolution for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/gao_peng_Question-Guided_Hybrid_Convolution_ECCV_2018_paper.pdf)][[中文解读]()]
184 | 36. [2018][ECCV] Straight to the Facts Learning Knowledge Base Retrieval for Factual Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Medhini_Gulganjalli_Narasimhan_Straight_to_the_ECCV_2018_paper.pdf)][[中文解读]()]
185 | 37. [2018][ECCV] Visual Question Answering as a Meta Learning Task.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Damien_Teney_Visual_Question_Answering_ECCV_2018_paper.pdf)][[中文解读]()]
186 | 38. [2018][ECCV] Visual Question Generation for Class Acquisition of Unknown Objects.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Kohei_Uehara_Visual_Question_Generation_ECCV_2018_paper.pdf)][[中文解读]()]
187 | 39. [2018][ECCV] VQA-E Explaining, Elaborating, and Enhancing Your Answers for Visual Questions.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Qing_Li_VQA-E_Explaining_Elaborating_ECCV_2018_paper.pdf)][[中文解读]()]
188 | 40. [2018][ICLR] Compositional Attention Networks for Machine Reasoning.[[paper](https://arxiv.org/pdf/1803.03067)][[中文解读]()]
189 | 41. [2018][ICLR] INTERPRETABLE COUNTING FOR VISUAL QUESTION ANSWERING.[[paper](https://arxiv.org/pdf/1712.08697)][[中文解读]()]
190 | 42. [2018][ICLR] LEARNING TO COUNT OBJECTS IN NATURAL IMAGES FOR VISUAL QUESTION ANSWERING.[[paper](https://arxiv.org/pdf/1802.05766)][[中文解读]()]
191 | 43. [2018][IJCAI] A Question Type Driven Framework to Diversify Visual Question Generation.[[paper](http://www.sdspeople.fudan.edu.cn/zywei/paper/fan-ijcai2018.pdf)][[中文解读]()]
192 | 44. [2018][IJCAI] Feature Enhancement in Attention for Visual Question Answering.[[paper](https://www.ijcai.org/Proceedings/2018/0586.pdf)][[中文解读]()]
193 | 45. [2018][IJCAI] From Pixels to Objects Cubic Visual Attention for Visual Question Answering.[[paper](https://www.ijcai.org/Proceedings/2018/0126.pdf)][[中文解读]()]
194 | 46. [2018][NIPS] Answerer in Questioner’s Mind Information Theoretic Approach to Goal-Oriented Visual Dialog.[[paper](https://arxiv.org/pdf/1802.03881;Answerer)][[中文解读]()]
195 | 47. [2018][NIPS] Bilinear Attention Networks.[[paper](https://arxiv.org/pdf/1805.07932)][[中文解读]()]
196 | 48. [2018][NIPS] Chain of Reasoning for Visual Question Answering.[[paper](http://papers.neurips.cc/paper/7311-chain-of-reasoning-for-visual-question-answering.pdf)][[中文解读]()]
197 | 49. [2018][NIPS] Dialog-to-Action Conversational Question Answering Over a Large-Scale Knowledge Base.[[paper](http://papers.neurips.cc/paper/7558-dialog-to-action-conversational-question-answering-over-a-large-scale-knowledge-base.pdf)][[中文解读]()]
198 | 50. [2018][NIPS] Learning Conditioned Graph Structures for Interpretable Visual Question Answering.[[paper](https://arxiv.org/pdf/1806.07243)][[中文解读]()]
199 | 51. [2018][NIPS] Learning to Specialize with Knowledge Distillation for Visual Question Answering.[[paper](http://alinlab.kaist.ac.kr/resource/2018_NIPS_KD_MCL.pdf)][[中文解读]()]
200 | 52. [2018][NIPS] Neural-Symbolic VQA Disentangling Reasoning from Vision and Language Understanding.[[paper](https://arxiv.org/pdf/1810.02338)][[中文解读]()]
201 | 53. [2018][NIPS] Out of the Box Reasoning with Graph Convolution Nets for Factual Visual Question Answering.[[paper](https://arxiv.org/pdf/1811.00538)][[中文解读]()]
202 | 54. [2018][NIPS] Overcoming Language Priors in Visual Question Answering with Adversarial Regularization.[[paper](https://arxiv.org/pdf/1810.03649)][[中文解读]()]
203 |
204 | ### 2017 Papers
205 | 1. [2017][CVPR] An Empirical Evaluation of Visual Question Answering for Novel Objects.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Ramakrishnan_An_Empirical_Evaluation_CVPR_2017_paper.pdf)][[中文解读]()]
206 | 2. [2017][CVPR] Are You Smarter Than A Sixth Grader Textbook Question Answering for Multimodal Machine Comprehension.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Kembhavi_Are_You_Smarter_CVPR_2017_paper.pdf)][[中文解读]()]
207 | 3. [2017][CVPR] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Johnson_CLEVR_A_Diagnostic_CVPR_2017_paper.pdf)][[中文解读]()]
208 | 4. [2017][CVPR] Creativity Generating Diverse Questions using Variational Autoencoders.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Jain_Creativity_Generating_Diverse_CVPR_2017_paper.pdf)][[中文解读]()]
209 | 5. [2017][CVPR] Graph-Structured Representations for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Teney_Graph-Structured_Representations_for_CVPR_2017_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)]
210 | 6. [2017][CVPR] Knowledge Acquisition for Visual Question Answering via Iterative Querying.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Knowledge_Acquisition_for_CVPR_2017_paper.pdf)][[中文解读]()]
211 | 7. [2017][CVPR] Making the V in VQA Matter Elevating the Role of Image Understanding in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Goyal_Making_the_v_CVPR_2017_paper.pdf)][[中文解读]()]
212 | 8. [2017][CVPR] Mining Object Parts from CNNs via Active Question-Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_Mining_Object_Parts_CVPR_2017_paper.pdf)][[中文解读]()]
213 | 9. [2017][CVPR] Multi-level Attention Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_Multi-Level_Attention_Networks_CVPR_2017_paper.pdf)]
214 | 10. [2017][CVPR] The VQA-Machine Learning How to Use Existing Vision Algorithms to Answer New Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_The_VQA-Machine_Learning_CVPR_2017_paper.pdf)]
215 | 11. [2017][CVPR] What’s in a Question Using Visual Questions as a Form of Supervision.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Ganju_Whats_in_a_CVPR_2017_paper.pdf)]
216 | 12. [2017][ICCV] An Analysis of Visual Question Answering Algorithms.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Kafle_An_Analysis_of_ICCV_2017_paper.pdf)]
217 | 13. [2017][ICCV] Inferring and Executing Programs for Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Johnson_Inferring_and_Executing_ICCV_2017_paper.pdf)]
218 | 14. [2017][ICCV] Learning to Disambiguate by Asking Discriminative Questions.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Li_Learning_to_Disambiguate_ICCV_2017_paper.pdf)]
219 | 15. [2017][ICCV] Learning to Reason End-to-End Module Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hu_Learning_to_Reason_ICCV_2017_paper.pdf)]
220 | 16. [2017][ICCV] Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Yu_Multi-Modal_Factorized_Bilinear_ICCV_2017_paper.pdf)]
221 | 17. [2017][ICCV] MUTAN Multimodal Tucker Fusion for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Ben-younes_MUTAN_Multimodal_Tucker_ICCV_2017_paper.pdf)]
222 | 18. [2017][ICCV] Structured Attentions for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhu_Structured_Attentions_for_ICCV_2017_paper.pdf)]
223 | 19. [2017][ICCV] VQS Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Gan_VQS_Linking_Segmentations_ICCV_2017_paper.pdf)]
224 | 20. [2017][IJCAI] Automatic Generation of Grounded Visual Questions.[[paper](https://arxiv.org/pdf/1612.06530)]
225 | 21. [2017][IJCAI] Explicit Knowledge-based Reasoning for Visual Question Answering.[[paper](https://arxiv.org/pdf/1511.02570)]
226 | 22. [2017][INLG] Data Augmentation for Visual Question Answering.[[paper](https://www.aclweb.org/anthology/W17-3529.pdf)]
227 | 23. [2017][NIPS] High-Order Attention Models for Visual Question Answering.[[paper](https://arxiv.org/pdf/1711.04323)]
228 | 24. [2017][NIPS] Multimodal Learning and Reasoning for Visual Question Answering.[[paper](http://papers.neurips.cc/paper/6658-multimodal-learning-and-reasoning-for-visual-question-answering.pdf)]
229 | 25. [2017][NIPS] Question Asking as Program Generation.[[paper](https://arxiv.org/pdf/1711.06351)]
230 |
231 | ### 2016 Papers
232 | 1. [2016][AAAI] Learning to answer questions from image using convolutional neural network.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/10442/10301)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
233 | 2. [2016][CVPR] Answer-Type Prediction for Visual Question Answering.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Kafle_Answer-Type_Prediction_for_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
234 | 3. [2016][CVPR] Ask Me Anything Free-Form Visual Question Answering Based on Knowledge From External Sources.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Wu_Ask_Me_Anything_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
235 | 4. [2016][CVPR] Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction.[[paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Noh_Image_Question_Answering_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)]
236 | 5. [2016][CVPR] Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Andreas_Neural_Module_Networks_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)]
237 | 6. [2016][CVPR] Stacked Attention Networks for Image Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yang_Stacked_Attention_Networks_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)]
238 | 7. [2016][CVPR] Visual7W: Grounded Question Answering in Images.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhu_Visual7W_Grounded_Question_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)]
239 | 8. [2016][CVPR] Where to Look Focus Regions for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shih_Where_to_Look_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113649487)]
240 | 9. [2016][CVPR] Yin and Yang Balancing and Answering Binary Visual Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhang_Yin_and_Yang_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113649487)]
241 | 10. [2016][ECCV] Ask, Attend and Answer Exploring Question-Guided Spatial Attention for Visual Question Answering.[[paper](https://arxiv.org/pdf/1511.05234)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113696813)]
242 | 11. [2016][ECCV] Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering.[[paper](https://arxiv.org/pdf/1604.04808)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113696813)]
243 | 12. [2016][ECCV] Leveraging Visual Question Answering for Image-Caption Ranking.[[paper](https://arxiv.org/pdf/1605.01379)][[中文解读]()]
244 | 13. [2016][ECCV] Revisiting Visual Question Answering Baselines.[[paper](https://arxiv.org/pdf/1606.08390)][[中文解读]()]
245 | 14. [2016][ICML] Dynamic Memory Networks for Visual and Textual Question Answering.[[paper](http://proceedings.mlr.press/v48/xiong16.pdf)][[中文解读]()]
246 | 15. [2016][NIPS] Hierarchical Question-Image Co-Attention for Visual Question Answering.[[paper](https://arxiv.org/pdf/1606.00061.pdf%20)][[中文解读]()]
247 | 16. [2016][NIPS] Multimodal Residual Learning for Visual QA.[[paper](https://arxiv.org/pdf/1606.01455)][[中文解读]()]
248 |
249 | ### 2015 Papers
250 | 1. [2015][CVPR] VisKE, Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Sadeghi_VisKE_Visual_Knowledge_2015_CVPR_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)]
251 | 2. [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Malinowski_Ask_Your_Neurons_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
252 | 3. [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Yu_Visual_Madlibs_Fill_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
253 | 4. [2015][ICCV] VQA, Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)]
254 | 5. [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[[paper](https://arxiv.org/pdf/1505.05612)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
255 | 6. [2015][NIPS] Exploring Models and Data for Image Question Answering.[[paper](https://arxiv.org/pdf/1505.02074)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)]
256 |
257 | ### 2014 Papers
258 | 1. [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[[paper](https://arxiv.org/pdf/1410.0210)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)]
259 |
260 |
261 |
262 |
263 | ## Embodied QA Papers
264 | -
265 |
266 |
267 | ## Knowledge-based VQA Papers
268 | ### Fact-based VQA Papers
269 | -
270 | ### Open-domain Knowledge-based VQA Papers
271 | -
272 |
273 |
274 | ## Interative QA Papers
275 | -
276 |
277 |
278 | ## Image-Set VQA Papers
279 | -
280 |
281 |
282 | ## Inverse VQA Papers
283 | -
284 |
285 |
286 | ## Text-based VQA Papers
287 | ### Data Visualization QA Papers
288 | -
289 | ### Textbook QA Papers
290 | -
291 | ### TextVQA Papers
292 | -
293 |
294 |
295 | ## Visual Reasoning Papers
296 | -
297 |
298 |
299 | ## Video QA Papers
300 | ### Datasets
301 | -
302 |
303 | ### 2021 Papers
304 | 1. [2021][CVPR] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xu_SUTD-TrafficQA_A_Question_Answering_Benchmark_and_an_Efficient_Network_for_CVPR_2021_paper.pdf)]
305 | 2. [2021][CVPR] Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Park_Bridge_To_Answer_Structure-Aware_Graph_Interaction_Network_for_Video_Question_CVPR_2021_paper.pdf)]
306 | 3. [2021][CVPR] NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xiao_NExT-QA_Next_Phase_of_Question-Answering_to_Explaining_Temporal_Actions_CVPR_2021_paper.pdf)]
307 | 4. [2021][ICCV] On the hidden treasure of dialog in video question answering.[[paper](https://arxiv.org/pdf/2103.14517.pdf)]
308 | 5. [2021][ICCV] Just Ask: Learning to Answer Questions from Millions of Narrated Videos.[[paper](https://arxiv.org/pdf/2012.00451.pdf)]
309 | 6. [2021][ICCV] Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Gao_Env-QA_A_Video_Question_Answering_Benchmark_for_Comprehensive_Understanding_of_ICCV_2021_paper.pdf)]
310 | 7. [2021][ICCV] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_HAIR_Hierarchical_Visual-Semantic_Relational_Reasoning_for_Video_Question_Answering_ICCV_2021_paper.pdf)]
311 | 8. [2021][ICCV] Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Yun_Pano-AVQA_Grounded_Audio-Visual_Question_Answering_on_360deg_Videos_ICCV_2021_paper.pdf)]
312 | 9. [2021][ICCV] Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Kim_Video_Question_Answering_Using_Language-Guided_Deep_Compressed-Domain_Video_Feature_ICCV_2021_paper.pdf)]
313 | 10. [2021][CVPR] .[[paper]()]
314 |
315 | ### 2020 Papers
316 | 1. [2020][CVPR] Hierarchical Conditional Relation Networks for Video Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Le_Hierarchical_Conditional_Relation_Networks_for_Video_Question_Answering_CVPR_2020_paper.pdf)]
317 | 2. [2020][CVPR] Modality Shifting Attention Network for Multi-Modal Video Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Modality_Shifting_Attention_Network_for_Multi-Modal_Video_Question_Answering_CVPR_2020_paper.pdf)]
318 | 3. [2020][ECCV][poster] Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions.[[paper](https://arxiv.org/pdf/2007.08751)]
319 | 4. [2020][TIP] Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.[[paper](https://ieeexplore.ieee.org/abstract/document/8974594/)]
320 | 5. [2020][WACV] BERT Representations for Video Question Answering.[[paper](http://openaccess.thecvf.com/content_WACV_2020/papers/Yang_BERT_representations_for_Video_Question_Answering_WACV_2020_paper.pdf)]
321 |
322 | ### 2019 Papers
323 | 1. [2019][AAAI] Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4887/4760)]
324 | 2. [2019][AAAI] Structured Two-stream Attention Network for Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4602/4480)]
325 | 3. [2019][ACMMM] Learnable Aggregating Net with Divergent Loss for VideoQA.[[paper](https://dl.acm.org/doi/10.1145/3343031.3350971)]
326 | 4. [2019][ACMMM] Multi-interaction Network with Object Relation for VideoQA.[[paper](https://dl.acm.org/citation.cfm?id=3351065)]
327 | 5. [2019][ACMMM] Question-Aware Tube-Switch Network for VideoQA.[[paper](https://dl.acm.org/citation.cfm?id=3350969)]
328 | 6. [2019][CVPR] Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Heterogeneous_Memory_Enhanced_Multimodal_Attention_Model_for_Video_Question_Answering_CVPR_2019_paper.pdf)]
329 | 7. [2019][CVPR] Progressive Attention Memory Network for Movie Story Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Progressive_Attention_Memory_Network_for_Movie_Story_Question_Answering_CVPR_2019_paper.pdf)]
330 | 8. [2019][ICCV] SegEQA Video Segmentation based Visual Attention for Embodied Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Luo_SegEQA_Video_Segmentation_Based_Visual_Attention_for_Embodied_Question_Answering_ICCV_2019_paper.pdf)]
331 | 9. [2019][IJCAI] Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks.[[paper](https://arxiv.org/pdf/1906.12158)]
332 | 10. [2019][IJCNN] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering.[[paper](https://arxiv.org/pdf/1905.13540)]
333 | 11. [2019][TIP] Compositional Attention Networks With Two-Stream Fusion for Video Question Answering.[[paper](https://ieeexplore.ieee.org/abstract/document/8839734/)]
334 | 12. [2019][TIP] Holistic Multi-modal Memory Network for Movie Question Answering.[[paper](https://arxiv.org/pdf/1811.04595)]
335 |
336 | ### 2018 Papers
337 | 1. [2018][ACMMM] Explore Multi-Step Reasoning in Video Question Answering.[[paper](https://dl.acm.org/doi/abs/10.1145/3240508.3240563)]
338 | 2. [2018][CVPR] Motion-Appearance Co-Memory Networks for Video Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gao_Motion-Appearance_Co-Memory_Networks_CVPR_2018_paper.pdf)]
339 | 3. [2018][ECCV] A Joint Sequence Fusion Model for Video Question Answering and Retrieval.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Youngjae_Yu_A_Joint_Sequence_ECCV_2018_paper.pdf)]
340 | 4. [2018][ECCV] Multimodal Dual Attention Memory for Video Story Question Answering.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Kyungmin_Kim_Multimodal_Dual_Attention_ECCV_2018_paper.pdf)]
341 | 5. [2018][EMNLP] TVQA, Localized, Compositional Video Question Answering.[[paper](https://arxiv.org/pdf/1809.01696)]
342 |
343 | ### 2017 Papers
344 | 1. [2017][AAAI] Leveraging Video Descriptions to Learn Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/11238/11097)]
345 | 2. [2017][ACMMM] VideoQA via Gradually Refined Attention over Appearance and Motion.[[paper](https://www.comp.nus.edu.sg/~xiangnan/papers/mm17-videoQA.pdf)]
346 | 3. [2017][ACMMM] VideoQA via Hierarchical Dual-Level Attention Network Learning.[[paper](https://dl.acm.org/doi/abs/10.1145/3123266.3123364)]
347 | 4. [2017][CVPR] A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Maharaj_A_Dataset_and_CVPR_2017_paper.pdf)]
348 | 5. [2017][CVPR] End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_End-To-End_Concept_Word_CVPR_2017_paper.pdf)]
349 | 6. [2017][CVPR] TGIF-QA Toward Spatio-Temporal Reasoning in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Jang_TGIF-QA_Toward_Spatio-Temporal_CVPR_2017_paper.pdf)]
350 | 7. [2017][ICCV] MarioQA Answering Questions by Watching Gameplay Videos.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Mun_MarioQA_Answering_Questions_ICCV_2017_paper.pdf)]
351 | 8. [2017][ICCV] Video Fill In the Blank using LRRL LSTMs with Spatial-Temporal Attentions.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Mazaheri_Video_Fill_in_ICCV_2017_paper.pdf)]
352 | 9. [2017][IJCAI] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.[[paper](https://www.ijcai.org/Proceedings/2017/0492.pdf)]
353 | 10. [2017][SIGIR] Video Question Answering via Attributed-Augmented Attention Network Learning.[[paper](https://arxiv.org/pdf/1707.06355)]
354 |
355 | ### 2016 Papers
356 | 1. [2016][CVPR] MovieQA, Understanding Stories in Movies through Question-Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Tapaswi_MovieQA_Understanding_Stories_CVPR_2016_paper.pdf)]
357 |
358 | ### 2015 Papers
359 | 1. [2015][arXiv] Uncovering the temporal context for video question and answering.[[paper](https://arxiv.org/pdf/1511.04670)]
360 |
361 | ### 2014 Papers
362 | 1. [2014][ACMMM] Joint video and text parsing for understanding events and answering queries.[[paper](https://arxiv.org/pdf/1308.6628)]
363 |
--------------------------------------------------------------------------------
/img/VQA.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NeverMoreLCH/Awesome-VQA/abae36c652e2463c5aa91f7812d1966172ad7cb8/img/VQA.jpg
--------------------------------------------------------------------------------