├── README.md └── img └── VQA.jpg /README.md: -------------------------------------------------------------------------------- 1 | # Awesome-VQA 2 | A reading list of papers about Visual Question Answering. 3 | 4 | ![VQA](./img/VQA.jpg) 5 | 6 |

7 | 8 | ## Table of Contents 9 | * [Image QA Papers](#image-qa-papers) 10 | * [Datasets](#datasets) 11 | * [2021 Papers](#2021-papers) 12 | * [2020 Papers](#2020-papers) 13 | * [2019 Papers](#2019-papers) 14 | * [2018 Papers](#2018-papers) 15 | * [2017 Papers](#2017-papers) 16 | * [2016 Papers](#2016-papers) 17 | * [2015 Papers](#2015-papers) 18 | * [2014 Papers](#2014-papers) 19 | 20 | * [Embodied QA Papers](#embodied-qa-papers) 21 | * [Knowledge-based VQA Papers](#knowledge-based-vqa-papers) 22 | * [Fact-based VQA Papers](#fact-based-vqa-papers) 23 | * [Open-domain Knowledge-based VQA Papers](#open-domain-knowledge-based-vqa-papers) 24 | * [Interative QA Papers](#interactive-qa-papers) 25 | * [Image-Set VQA Papers](#image-set-vqa-papers) 26 | * [Inverse VQA Papers](#inverse-vqa-papers) 27 | * [Text-based VQA Papers](#text-based-vqa-papers) 28 | * [Data Visualization QA Papers](#data-visualization-qa-papers) 29 | * [Textbook QA Papers](#textbook-qa-papers) 30 | * [TextVQA Papers](#textvqa-papers) 31 | * [Visual Reasoning Papers](#visual-reasoning-papers) 32 | 33 | * [Video QA Papers](#video-qa-papers) 34 | * [Datasets](#datasets-1) 35 | * [2021 Papers](#2021-papers-1) 36 | * [2020 Papers](#2020-papers-1) 37 | * [2019 Papers](#2019-papers-1) 38 | * [2018 Papers](#2018-papers-1) 39 | * [2017 Papers](#2017-papers-1) 40 | * [2016 Papers](#2016-papers-1) 41 | * [2015 Papers](#2015-papers-1) 42 | * [2014 Papers](#2014-papers-1) 43 | 44 |

45 | 46 | ## Image QA Papers 47 | ### Datasets 48 | 1. **GQA** [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)][[dataset](https://cs.stanford.edu/people/dorarad/gqa/evaluate.html)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)] 49 | 2. **VQA-CP** [2018][CVPR] Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Agrawal_Dont_Just_Assume_CVPR_2018_paper.pdf)][[dataset](https://www.cc.gatech.edu/grads/a/aagrawal307/vqa-cp/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)] 50 | 3. **VQA v2.0** [2017][CVPR] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering.[[paper](https://arxiv.org/pdf/1612.00837.pdf)][[dataset](https://visualqa.org/)] 51 | 4. **Visual7W** [2016][CVPR] Visual7W: Grounded Question Answering in Images.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhu_Visual7W_Grounded_Question_CVPR_2016_paper.pdf)][[dataset](http://ai.stanford.edu/~yukez/visual7w/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)] 52 | 5. **SHAPES** [2016][CVPR] Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Andreas_Neural_Module_Networks_CVPR_2016_paper.pdf)][[dataset](https://github.com/jacobandreas/nmn2)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)] 53 | 6. **FM-IQA** [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[[paper](https://arxiv.org/pdf/1505.05612)][[dataset](http://research.baidu.com/Downloads)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 54 | 7. **VQA v1.0** [2015][ICCV] VQA, Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf)][[dataset](https://visualqa.org/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 55 | 8. **Visual Madlibs** [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Yu_Visual_Madlibs_Fill_ICCV_2015_paper.pdf)][[dataset](http://tamaraberg.com/visualmadlibs/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 56 | 9. **DAQUAR-Consensus** [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Malinowski_Ask_Your_Neurons_ICCV_2015_paper.pdf)][[dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 57 | 10. **DAQUAR** [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[[paper](https://arxiv.org/pdf/1410.0210)][[dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge/)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)] 58 | 59 | ### 2021 Papers 60 | 1. [2021][AAAI] Regularizing Attention Networks for Anomaly Detection in Visual Question Answering.[[paper](https://arxiv.org/pdf/2009.10054)][[中文解读](https://blog.csdn.net/ms961516792/article/details/114974963?spm=1001.2014.3001.5501)] 61 | 2. [2021][CVPR] Causal Attention for Vision-Language Tasks.[[paper](https://arxiv.org/pdf/2103.03493)][[中文解读](https://blog.csdn.net/ms961516792/article/details/116641619?spm=1001.2014.3001.5501)] 62 | 3. [2021][CVPR] Counterfactual VQA: A Cause-Effect Look at Language Bias.[[paper](https://arxiv.org/pdf/2006.04315)] 63 | 4. [2021][CVPR] Domain-robust VQA with diverse datasets and methods but no target labels.[[paper](https://arxiv.org/pdf/2103.15974)][[中文解读](https://blog.csdn.net/ms961516792/article/details/118001142)] 64 | 5. [2021][CVPR] Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules.[[paper](https://arxiv.org/pdf/2105.04836)] 65 | 6. [2021][CVPR] Separating Skills and Concepts for Novel Visual Question Answering.[[paper](https://blender.cs.illinois.edu/paper/cvpr2021.pdf)] 66 | 7. [2021][CVPR] Transformation Driven Visual Reasoning.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Hong_Transformation_Driven_Visual_Reasoning_CVPR_2021_paper.pdf)] 67 | 8. [2021][CVPR] Predicting Human Scanpaths in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Predicting_Human_Scanpaths_in_Visual_Question_Answering_CVPR_2021_paper.pdf)] 68 | 9. [2021][CVPR] Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_Perception_Matters_Detecting_Perception_Failures_of_VQA_Models_Using_Metamorphic_CVPR_2021_paper.pdf)] 69 | 10. [2021][CVPR] Roses are Red, Violets are Blue... But Should VQA expect Them To?.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Kervadec_Roses_Are_Red_Violets_Are_Blue..._but_Should_VQA_Expect_CVPR_2021_paper.pdf)] 70 | 11. [2021][CVPR] How Transferable are Reasoning Patterns in VQA?.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Kervadec_How_Transferable_Are_Reasoning_Patterns_in_VQA_CVPR_2021_paper.pdf)] 71 | 12. [2021][CVPR] KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Marino_KRISP_Integrating_Implicit_and_Symbolic_Knowledge_for_Open-Domain_Knowledge-Based_VQA_CVPR_2021_paper.pdf)] 72 | 13. [2021][CVPR] TAP: Text-Aware Pre-training for Text-VQA and Text-Caption.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_TAP_Text-Aware_Pre-Training_for_Text-VQA_and_Text-Caption_CVPR_2021_paper.pdf)] 73 | 14. [2021][ICCV] Greedy Gradient Ensemble for Robust Visual Question Answering.[[paper](https://arxiv.org/pdf/2107.12651.pdf)] 74 | 15. [2021][ICCV] Auto-Parsing Network for Image Captioning and Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Yang_Auto-Parsing_Network_for_Image_Captioning_and_Visual_Question_Answering_ICCV_2021_paper.pdf)] 75 | 16. [2021][ICCV] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Dancette_Beyond_Question-Based_Biases_Assessing_Multimodal_Shortcut_Learning_in_Visual_Question_ICCV_2021_paper.pdf)] 76 | 17. [2021][ICCV] Linguistically Routing Capsule Network for Out-of-distribution Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Cao_Linguistically_Routing_Capsule_Network_for_Out-of-Distribution_Visual_Question_Answering_ICCV_2021_paper.pdf)] 77 | 18. [2021][ICCV] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Banerjee_Weakly_Supervised_Relative_Spatial_Reasoning_for_Visual_Question_Answering_ICCV_2021_paper.pdf)] 78 | 19. [2021][ICCV] Unshuffling Data for Improved Generalization in Visual Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Teney_Unshuffling_Data_for_Improved_Generalization_in_Visual_Question_Answering_ICCV_2021_paper.pdf)] 79 | 20. [2021][TIP] Re-Attention for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/5338/5194)] 80 | 21. [2021][SIGIR] LPF, A Language-Prior Feedback Objective Function for De-biased Visual Question Answering.[[paper](https://arxiv.org/pdf/2105.14300)] 81 | 22. [2021][ICCV] .[[paper]()] 82 | 83 | 84 | ### 2020 Papers 85 | 1. [2020][arXiv] KRISP Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA.[[paper](https://arxiv.org/pdf/2012.11014)] 86 | 2. [2020][AAAI] Overcoming Language Priors in VQA via Decomposed Linguistic Representations.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/6776/6630)] 87 | 3. [2020][ACL] Cross-Modality Relevance for Reasoning on Language and Vision.[[paper](https://arxiv.org/pdf/2005.06035)][[中文解读](https://blog.csdn.net/ms961516792/article/details/108254786)] 88 | 4. [2020][CVPR] Counterfactual Samples Synthesizing for Robust Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Chen_Counterfactual_Samples_Synthesizing_for_Robust_Visual_Question_Answering_CVPR_2020_paper.pdf)] 89 | 5. [2020][CVPR] Counterfactual Vision and Language Learning.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Abbasnejad_Counterfactual_Vision_and_Language_Learning_CVPR_2020_paper.pdf)] 90 | 6. [2020][CVPR] Fantastic Answers and Where to Find Them Immersive Question-Directed Visual Attention.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Jiang_Fantastic_Answers_and_Where_to_Find_Them_Immersive_Question-Directed_Visual_CVPR_2020_paper.pdf)] 91 | 7. [2020][CVPR] Hypergraph Attention Networks for Multimodal Learning.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Hypergraph_Attention_Networks_for_Multimodal_Learning_CVPR_2020_paper.pdf)] 92 | 8. [2020][CVPR] In Defense of Grid Features for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Jiang_In_Defense_of_Grid_Features_for_Visual_Question_Answering_CVPR_2020_paper.pdf)] 93 | 9. [2020][CVPR]Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Gao_Multi-Modal_Graph_Neural_Network_for_Joint_Reasoning_on_Vision_and_CVPR_2020_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)] 94 | 10. [2020][CVPR] On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_On_the_General_Value_of_Evidence_and_Bilingual_Scene-Text_Visual_CVPR_2020_paper.pdf)] 95 | 11. [2020][CVPR] SQuINTing at VQA Models, Introspecting VQA Models With Sub-Questions.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Selvaraju_SQuINTing_at_VQA_Models_Introspecting_VQA_Models_With_Sub-Questions_CVPR_2020_paper.pdf)] 96 | 12. [2020][CVPR] TA-Student VQA, Multi-Agents Training by Self-Questioning.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Xiong_TA-Student_VQA_Multi-Agents_Training_by_Self-Questioning_CVPR_2020_paper.pdf)] 97 | 13. [2020][CVPR] Towards Causal VQA Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Agarwal_Towards_Causal_VQA_Revealing_and_Reducing_Spurious_Correlations_by_Invariant_CVPR_2020_paper.pdf)] 98 | 14. [2020][CVPR] Visual Commonsense R-CNN.[[paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_Visual_Commonsense_R-CNN_CVPR_2020_paper.pdf)] 99 | 15. [2020][CVPR] VQA with No Questions-Answers Training.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Vatashsky_VQA_With_No_Questions-Answers_Training_CVPR_2020_paper.pdf)] 100 | 16. [2020][ECCV][oral] A Competence-aware Curriculum for Visual Concepts Learning via Question Answering.[[paper](https://arxiv.org/pdf/2007.01499)] 101 | 17. [2020][ECCV][poster] Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123540528.pdf)] 102 | 18. [2020][ECCV][poster] Multi-Agent Embodied Question Answering in Interactive Environments.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580647.pdf)] 103 | 19. [2020][ECCV][poster] Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580018.pdf)] 104 | 20. [2020][ECCV][poster] Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123640426.pdf)] 105 | 21. [2020][ECCV][poster] TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660409.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/112299186)] 106 | 22. [2020][ECCV][poster] Visual Question Answering on Image Sets.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660052.pdf)] 107 | 23. [2020][ECCV][poster] VQA-LOL: Visual Question Answering under the Lens of Logic.[[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123660375.pdf)] 108 | 24. [2020][EMNLP] MUTANT: A Training Paradigm for Out-of-Distribution Generalization in VQA.[[paper](https://arxiv.org/pdf/2009.08566.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/114951597)] 109 | 25. [2020][IJCAI] Mucko, Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based VisualQuestion Answering.[[paper](https://arxiv.org/pdf/2006.09073)] 110 | 26. [2020][NeurIPS] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering.[[paper](https://proceedings.neurips.cc/paper/2020/file/1fd6c4e41e2c6a6b092eb13ee72bce95-Paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113974139)] 111 | 27. [2020][TMM] Self-Adaptive Neural Module Transformer for Visual Question Answering.[[paper](https://ieeexplore.ieee.org/abstract/document/9095237)] 112 | 113 | ### 2019 Papers 114 | 1. [2019][AAAI] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection.[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/4818/4691)] 115 | 2. [2019][AAAI] Lattice CNNs for Matching Based Chinese Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4633/4511)] 116 | 3. [2019][AAAI] TallyQA Answering Complex Counting Questions.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4815/4688)] 117 | 4. [2019][ACMMM] Perceptual Visual Reasoning with Knowledge Propagation.[[paper](https://www.researchgate.net/profile/Xin_Wang274/publication/336710651_Perceptual_Visual_Reasoning_with_Knowledge_Propagation/links/5f4f9ebb299bf13a31978287/Perceptual-Visual-Reasoning-with-Knowledge-Propagation.pdf)] 118 | 5. [2019][CVPR] Cycle-Consistency for Robust Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Shah_Cycle-Consistency_for_Robust_Visual_Question_Answering_CVPR_2019_paper.pdf)] 119 | 6. [2019][CVPR] Embodied Question Answering in Photorealistic Environments with Point Cloud Perception.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Wijmans_Embodied_Question_Answering_in_Photorealistic_Environments_With_Point_Cloud_Perception_CVPR_2019_paper.pdf)] 120 | 7. [2019][CVPR] Explainable and Explicit Visual Reasoning over Scene Graphs.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Shi_Explainable_and_Explicit_Visual_Reasoning_Over_Scene_Graphs_CVPR_2019_paper.pdf)] 121 | 8. [2019][CVPR] GQA, A New Dataset for Real-World Visual Reasoning and Compositional Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/105073506)] 122 | 9. [2019][CVPR] It’s not about the Journey; It’s about the Destination Following Soft Paths under Question-Guidance for Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Haurilet_Its_Not_About_the_Journey_Its_About_the_Destination_Following_CVPR_2019_paper.pdf)] 123 | 10. [2019][CVPR] MUREL, Multimodal Relational Reasoning for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Cadene_MUREL_Multimodal_Relational_Reasoning_for_Visual_Question_Answering_CVPR_2019_paper.pdf)] 124 | 11. [2019][CVPR] Towards VQA Models That Can Read.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Singh_Towards_VQA_Models_That_Can_Read_CVPR_2019_paper.pdf)][[中文解读]()] 125 | 12. [2019][CVPR] Transfer Learning via Unsupervised Task Discovery for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Noh_Transfer_Learning_via_Unsupervised_Task_Discovery_for_Visual_Question_Answering_CVPR_2019_paper.pdf)][[中文解读]()] 126 | 13. [2019][CVPR] Visual Question Answering as Reading Comprehension.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Visual_Question_Answering_as_Reading_Comprehension_CVPR_2019_paper.pdf)][[中文解读]()] 127 | 14. [2019][ICCV] Compact Trilinear Interaction for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Do_Compact_Trilinear_Interaction_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()] 128 | 15. [2019][ICCV] Language-Conditioned Graph Networks for Relational Reasoning.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Hu_Language-Conditioned_Graph_Networks_for_Relational_Reasoning_ICCV_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)] 129 | 16. [2019][ICCV] Multi-modality Latent Interaction Network for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Gao_Multi-Modality_Latent_Interaction_Network_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()] 130 | 17. [2019][ICCV] Relation-Aware Graph Attention Network for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Li_Relation-Aware_Graph_Attention_Network_for_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)] 131 | 18. [2019][ICCV] Scene Text Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Biten_Scene_Text_Visual_Question_Answering_ICCV_2019_paper.pdf)][[中文解读]()] 132 | 19. [2019][ICCV] Taking a HINT Leveraging Explanations to Make Vision and Language Models More Grounded.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Selvaraju_Taking_a_HINT_Leveraging_Explanations_to_Make_Vision_and_Language_ICCV_2019_paper.pdf)][[中文解读]()] 133 | 20. [2019][ICCV] U-CAM, Visual Explanation using Uncertainty based Class Activation Maps.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Patro_U-CAM_Visual_Explanation_Using_Uncertainty_Based_Class_Activation_Maps_ICCV_2019_paper.pdf)][[中文解读]()] 134 | 21. [2019][ICCV] Why Does a Visual Question Have Different Answers.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Bhattacharya_Why_Does_a_Visual_Question_Have_Different_Answers_ICCV_2019_paper.pdf)][[中文解读]()] 135 | 22. [2019][ICLR] THE NEURO-SYMBOLIC CONCEPT LEARNER: INTERPRETING SCENES, WORDS, AND SENTENCES FROM NATURAL SUPERVISION.[[paper](https://arxiv.org/pdf/1904.12584)][[中文解读]()] 136 | 23. [2019][ICLR] Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering.[[paper](https://arxiv.org/pdf/1901.00603)][[中文解读]()] 137 | 24. [2019][ICLR] Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering.[[paper](https://arxiv.org/pdf/1905.05733)][[中文解读]()] 138 | 25. [2019][ICLR] VISUAL REASONING BY PROGRESSIVE MODULE NETWORKS.[[paper](https://arxiv.org/pdf/1806.02453)][[中文解读]()] 139 | 26. [2019][ICML] Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering.[[paper](http://proceedings.mlr.press/v97/vedantam19a/vedantam19a.pdf)][[中文解读]()] 140 | 27. [2019][NeurIPS] Analyzing Compositionality of Visual Question Answering.[[paper]()](not found)[[中文解读]()] 141 | 28. [2019][NeurIPS] Heterogeneous Graph Learning for Visual Commonsense Reasoning.[[paper](https://arxiv.org/pdf/1910.11475)][[中文解读]()] 142 | 29. [2019][NeurIPS] Learning by Abstraction The Neural State Machine.[[paper](https://arxiv.org/pdf/1907.03950)][[中文解读]()] 143 | 30. [2019][NeurIPS] Learning Dynamics of Attention Human Prior for Interpretable Machine Reasoning.[[paper](https://arxiv.org/pdf/1905.11666)][[中文解读]()] 144 | 31. [2019][NeurIPS] RUBi: Reducing Unimodal Biases in Visual Question Answering.[[paper](https://arxiv.org/pdf/1906.10169)][[中文解读]()] 145 | 32. [2019][NeurIPS] Self-Critical Reasoning for Robust Visual Question Answering.[[paper](https://arxiv.org/pdf/1905.09998)][[中文解读]()] 146 | 33. [2019][NeurIPS] Visual Concept-Metaconcept Learning.[[paper](https://arxiv.org/pdf/2002.01464)][[中文解读]()] 147 | 148 | ### 2018 Papers 149 | 1. [2018][AAAI] Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12240/12099)][[中文解读]()] 150 | 2. [2018][AAAI] Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/11324/11183)][[中文解读]()] 151 | 3. [2018][AAAI] Exploring Human-like Attention Supervision in Visual Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12272/12131)][[中文解读]()] 152 | 4. [2018][AAAI] Movie Question Answering Remembering the Textual Cues for Layered Visual Contents.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/12253/12112)][[中文解读]()] 153 | 5. [2018][CVPR] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Anderson_Bottom-Up_and_Top-Down_CVPR_2018_paper.pdf)][[中文解读]()] 154 | 6. [2018][CVPR] Cross-Dataset Adaptation for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Chao_Cross-Dataset_Adaptation_for_CVPR_2018_paper.pdf)][[中文解读]()] 155 | 7. [2018][CVPR] Customized Image Narrative Generation via Interactive Visual Question Generation and Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Shin_Customized_Image_Narrative_CVPR_2018_paper.pdf)][[中文解读]()] 156 | 8. [2018][CVPR] Differential Attention for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Patro_Differential_Attention_for_CVPR_2018_paper.pdf)][[中文解读]()] 157 | 9. [2018][CVPR] Don’t Just Assume; Look and Answer Overcoming Priors for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Agrawal_Dont_Just_Assume_CVPR_2018_paper.pdf)][[中文解读]()] 158 | 10. [2018][CVPR] DVQA Understanding Data Visualizations via Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Kafle_DVQA_Understanding_Data_CVPR_2018_paper.pdf)][[中文解读]()] 159 | 11. [2018][CVPR] Embodied Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Das_Embodied_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()] 160 | 12. [2018][CVPR] Focal Visual-Text Attention for Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Liang_Focal_Visual-Text_Attention_CVPR_2018_paper.pdf)][[中文解读]()] 161 | 13. [2018][CVPR] Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Nguyen_Improved_Fusion_of_CVPR_2018_paper.pdf)][[中文解读]()] 162 | 14. [2018][CVPR] IQA Visual Question Answering in Interactive Environments.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gordon_IQA_Visual_Question_CVPR_2018_paper.pdf)][[中文解读]()] 163 | 15. [2018][CVPR] iVQA Inverse Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_IVQA_Inverse_Visual_CVPR_2018_paper.pdf)][[中文解读]()] 164 | 16. [2018][CVPR] Learning Answer Embeddings for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Learning_Answer_Embeddings_CVPR_2018_paper.pdf)][[中文解读]()] 165 | 17. [2018][CVPR] Learning by Asking Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Misra_Learning_by_Asking_CVPR_2018_paper.pdf)][[中文解读]()] 166 | 18. [2018][CVPR] Learning Visual Knowledge Memory Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Su_Learning_Visual_Knowledge_CVPR_2018_paper.pdf)][[中文解读]()] 167 | 19. [2018][CVPR] Multimodal Explanations Justifying Decisions and Pointing to the Evidence.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Park_Multimodal_Explanations_Justifying_CVPR_2018_paper.pdf)][[中文解读]()] 168 | 20. [2018][CVPR] Textbook Question Answering under Instructor Guidance with Memory Networks.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Textbook_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()] 169 | 21. [2018][CVPR] Tips and Tricks for Visual Question Answering Learnings from the 2017 Challenge.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Teney_Tips_and_Tricks_CVPR_2018_paper.pdf)][[中文解读]()] 170 | 22. [2018][CVPR] Transparency by Design Closing the Gap Between Performance and Interpretability in Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Mascharka_Transparency_by_Design_CVPR_2018_paper.pdf)][[中文解读]()] 171 | 23. [2018][CVPR] Two can play this Game Visual Dialog with Discriminative Question Generation and Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Jain_Two_Can_Play_CVPR_2018_paper.pdf)][[中文解读]()] 172 | 24. [2018][CVPR] Visual Question Answering with Memory-Augmented Networks.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Ma_Visual_Question_Answering_CVPR_2018_paper.pdf)][[中文解读]()] 173 | 25. [2018][CVPR] Visual Question Generation as Dual Task of Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_Visual_Question_Generation_CVPR_2018_paper.pdf)][[中文解读]()] 174 | 26. [2018][CVPR] Visual Question Reasoning on General Dependency Tree.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Cao_Visual_Question_Reasoning_CVPR_2018_paper.pdf)][[中文解读]()] 175 | 27. [2018][CVPR] VizWiz Grand Challenge Answering Visual Questions from Blind People.[[paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Gurari_VizWiz_Grand_Challenge_CVPR_2018_paper.pdf)][[中文解读]()] 176 | 28. [2018][ECCV] A Dataset and Architecture for Visual Reasoning with a Working Memory.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Guangyu_Robert_Yang_A_dataset_and_ECCV_2018_paper.pdf)][[中文解读]()] 177 | 29. [2018][ECCV] Deep Attention Neural Tensor Network for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yalong_Bai_Deep_Attention_Neural_ECCV_2018_paper.pdf)][[中文解读]()] 178 | 30. [2018][ECCV] Explainable Neural Computation via Stack Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Ronghang_Hu_Explainable_Neural_Computation_ECCV_2018_paper.pdf)][[中文解读]()] 179 | 31. [2018][ECCV] Goal-Oriented Visual Question Generation via Intermediate Rewards.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Junjie_Zhang_Goal-Oriented_Visual_Question_ECCV_2018_paper.pdf)][[中文解读]()] 180 | 32. [2018][ECCV] Grounding Visual Explanations.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Lisa_Anne_Hendricks_Grounding_Visual_Explanations_ECCV_2018_paper.pdf)][[中文解读]()] 181 | 33. [2018][ECCV] Learning Visual Question Answering by Bootstrapping Hard Attention.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Mateusz_Malinowski_Learning_Visual_Question_ECCV_2018_paper.pdf)][[中文解读]()] 182 | 34. [2018][ECCV] Question Type Guided Attention in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yang_Shi_Question_Type_Guided_ECCV_2018_paper.pdf)][[中文解读]()] 183 | 35. [2018][ECCV] Question-Guided Hybrid Convolution for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/gao_peng_Question-Guided_Hybrid_Convolution_ECCV_2018_paper.pdf)][[中文解读]()] 184 | 36. [2018][ECCV] Straight to the Facts Learning Knowledge Base Retrieval for Factual Visual Question Answering.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Medhini_Gulganjalli_Narasimhan_Straight_to_the_ECCV_2018_paper.pdf)][[中文解读]()] 185 | 37. [2018][ECCV] Visual Question Answering as a Meta Learning Task.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Damien_Teney_Visual_Question_Answering_ECCV_2018_paper.pdf)][[中文解读]()] 186 | 38. [2018][ECCV] Visual Question Generation for Class Acquisition of Unknown Objects.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Kohei_Uehara_Visual_Question_Generation_ECCV_2018_paper.pdf)][[中文解读]()] 187 | 39. [2018][ECCV] VQA-E Explaining, Elaborating, and Enhancing Your Answers for Visual Questions.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Qing_Li_VQA-E_Explaining_Elaborating_ECCV_2018_paper.pdf)][[中文解读]()] 188 | 40. [2018][ICLR] Compositional Attention Networks for Machine Reasoning.[[paper](https://arxiv.org/pdf/1803.03067)][[中文解读]()] 189 | 41. [2018][ICLR] INTERPRETABLE COUNTING FOR VISUAL QUESTION ANSWERING.[[paper](https://arxiv.org/pdf/1712.08697)][[中文解读]()] 190 | 42. [2018][ICLR] LEARNING TO COUNT OBJECTS IN NATURAL IMAGES FOR VISUAL QUESTION ANSWERING.[[paper](https://arxiv.org/pdf/1802.05766)][[中文解读]()] 191 | 43. [2018][IJCAI] A Question Type Driven Framework to Diversify Visual Question Generation.[[paper](http://www.sdspeople.fudan.edu.cn/zywei/paper/fan-ijcai2018.pdf)][[中文解读]()] 192 | 44. [2018][IJCAI] Feature Enhancement in Attention for Visual Question Answering.[[paper](https://www.ijcai.org/Proceedings/2018/0586.pdf)][[中文解读]()] 193 | 45. [2018][IJCAI] From Pixels to Objects Cubic Visual Attention for Visual Question Answering.[[paper](https://www.ijcai.org/Proceedings/2018/0126.pdf)][[中文解读]()] 194 | 46. [2018][NIPS] Answerer in Questioner’s Mind Information Theoretic Approach to Goal-Oriented Visual Dialog.[[paper](https://arxiv.org/pdf/1802.03881;Answerer)][[中文解读]()] 195 | 47. [2018][NIPS] Bilinear Attention Networks.[[paper](https://arxiv.org/pdf/1805.07932)][[中文解读]()] 196 | 48. [2018][NIPS] Chain of Reasoning for Visual Question Answering.[[paper](http://papers.neurips.cc/paper/7311-chain-of-reasoning-for-visual-question-answering.pdf)][[中文解读]()] 197 | 49. [2018][NIPS] Dialog-to-Action Conversational Question Answering Over a Large-Scale Knowledge Base.[[paper](http://papers.neurips.cc/paper/7558-dialog-to-action-conversational-question-answering-over-a-large-scale-knowledge-base.pdf)][[中文解读]()] 198 | 50. [2018][NIPS] Learning Conditioned Graph Structures for Interpretable Visual Question Answering.[[paper](https://arxiv.org/pdf/1806.07243)][[中文解读]()] 199 | 51. [2018][NIPS] Learning to Specialize with Knowledge Distillation for Visual Question Answering.[[paper](http://alinlab.kaist.ac.kr/resource/2018_NIPS_KD_MCL.pdf)][[中文解读]()] 200 | 52. [2018][NIPS] Neural-Symbolic VQA Disentangling Reasoning from Vision and Language Understanding.[[paper](https://arxiv.org/pdf/1810.02338)][[中文解读]()] 201 | 53. [2018][NIPS] Out of the Box Reasoning with Graph Convolution Nets for Factual Visual Question Answering.[[paper](https://arxiv.org/pdf/1811.00538)][[中文解读]()] 202 | 54. [2018][NIPS] Overcoming Language Priors in Visual Question Answering with Adversarial Regularization.[[paper](https://arxiv.org/pdf/1810.03649)][[中文解读]()] 203 | 204 | ### 2017 Papers 205 | 1. [2017][CVPR] An Empirical Evaluation of Visual Question Answering for Novel Objects.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Ramakrishnan_An_Empirical_Evaluation_CVPR_2017_paper.pdf)][[中文解读]()] 206 | 2. [2017][CVPR] Are You Smarter Than A Sixth Grader Textbook Question Answering for Multimodal Machine Comprehension.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Kembhavi_Are_You_Smarter_CVPR_2017_paper.pdf)][[中文解读]()] 207 | 3. [2017][CVPR] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Johnson_CLEVR_A_Diagnostic_CVPR_2017_paper.pdf)][[中文解读]()] 208 | 4. [2017][CVPR] Creativity Generating Diverse Questions using Variational Autoencoders.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Jain_Creativity_Generating_Diverse_CVPR_2017_paper.pdf)][[中文解读]()] 209 | 5. [2017][CVPR] Graph-Structured Representations for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Teney_Graph-Structured_Representations_for_CVPR_2017_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113843893)] 210 | 6. [2017][CVPR] Knowledge Acquisition for Visual Question Answering via Iterative Querying.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhu_Knowledge_Acquisition_for_CVPR_2017_paper.pdf)][[中文解读]()] 211 | 7. [2017][CVPR] Making the V in VQA Matter Elevating the Role of Image Understanding in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Goyal_Making_the_v_CVPR_2017_paper.pdf)][[中文解读]()] 212 | 8. [2017][CVPR] Mining Object Parts from CNNs via Active Question-Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_Mining_Object_Parts_CVPR_2017_paper.pdf)][[中文解读]()] 213 | 9. [2017][CVPR] Multi-level Attention Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_Multi-Level_Attention_Networks_CVPR_2017_paper.pdf)] 214 | 10. [2017][CVPR] The VQA-Machine Learning How to Use Existing Vision Algorithms to Answer New Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_The_VQA-Machine_Learning_CVPR_2017_paper.pdf)] 215 | 11. [2017][CVPR] What’s in a Question Using Visual Questions as a Form of Supervision.[[paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Ganju_Whats_in_a_CVPR_2017_paper.pdf)] 216 | 12. [2017][ICCV] An Analysis of Visual Question Answering Algorithms.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Kafle_An_Analysis_of_ICCV_2017_paper.pdf)] 217 | 13. [2017][ICCV] Inferring and Executing Programs for Visual Reasoning.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Johnson_Inferring_and_Executing_ICCV_2017_paper.pdf)] 218 | 14. [2017][ICCV] Learning to Disambiguate by Asking Discriminative Questions.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Li_Learning_to_Disambiguate_ICCV_2017_paper.pdf)] 219 | 15. [2017][ICCV] Learning to Reason End-to-End Module Networks for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Hu_Learning_to_Reason_ICCV_2017_paper.pdf)] 220 | 16. [2017][ICCV] Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Yu_Multi-Modal_Factorized_Bilinear_ICCV_2017_paper.pdf)] 221 | 17. [2017][ICCV] MUTAN Multimodal Tucker Fusion for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Ben-younes_MUTAN_Multimodal_Tucker_ICCV_2017_paper.pdf)] 222 | 18. [2017][ICCV] Structured Attentions for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Zhu_Structured_Attentions_for_ICCV_2017_paper.pdf)] 223 | 19. [2017][ICCV] VQS Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Gan_VQS_Linking_Segmentations_ICCV_2017_paper.pdf)] 224 | 20. [2017][IJCAI] Automatic Generation of Grounded Visual Questions.[[paper](https://arxiv.org/pdf/1612.06530)] 225 | 21. [2017][IJCAI] Explicit Knowledge-based Reasoning for Visual Question Answering.[[paper](https://arxiv.org/pdf/1511.02570)] 226 | 22. [2017][INLG] Data Augmentation for Visual Question Answering.[[paper](https://www.aclweb.org/anthology/W17-3529.pdf)] 227 | 23. [2017][NIPS] High-Order Attention Models for Visual Question Answering.[[paper](https://arxiv.org/pdf/1711.04323)] 228 | 24. [2017][NIPS] Multimodal Learning and Reasoning for Visual Question Answering.[[paper](http://papers.neurips.cc/paper/6658-multimodal-learning-and-reasoning-for-visual-question-answering.pdf)] 229 | 25. [2017][NIPS] Question Asking as Program Generation.[[paper](https://arxiv.org/pdf/1711.06351)] 230 | 231 | ### 2016 Papers 232 | 1. [2016][AAAI] Learning to answer questions from image using convolutional neural network.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/10442/10301)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 233 | 2. [2016][CVPR] Answer-Type Prediction for Visual Question Answering.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Kafle_Answer-Type_Prediction_for_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 234 | 3. [2016][CVPR] Ask Me Anything Free-Form Visual Question Answering Based on Knowledge From External Sources.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Wu_Ask_Me_Anything_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 235 | 4. [2016][CVPR] Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction.[[paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Noh_Image_Question_Answering_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)] 236 | 5. [2016][CVPR] Neural Module Networks.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Andreas_Neural_Module_Networks_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113535076)] 237 | 6. [2016][CVPR] Stacked Attention Networks for Image Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Yang_Stacked_Attention_Networks_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)] 238 | 7. [2016][CVPR] Visual7W: Grounded Question Answering in Images.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhu_Visual7W_Grounded_Question_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113609614)] 239 | 8. [2016][CVPR] Where to Look Focus Regions for Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Shih_Where_to_Look_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113649487)] 240 | 9. [2016][CVPR] Yin and Yang Balancing and Answering Binary Visual Questions.[[paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Zhang_Yin_and_Yang_CVPR_2016_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113649487)] 241 | 10. [2016][ECCV] Ask, Attend and Answer Exploring Question-Guided Spatial Attention for Visual Question Answering.[[paper](https://arxiv.org/pdf/1511.05234)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113696813)] 242 | 11. [2016][ECCV] Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering.[[paper](https://arxiv.org/pdf/1604.04808)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113696813)] 243 | 12. [2016][ECCV] Leveraging Visual Question Answering for Image-Caption Ranking.[[paper](https://arxiv.org/pdf/1605.01379)][[中文解读]()] 244 | 13. [2016][ECCV] Revisiting Visual Question Answering Baselines.[[paper](https://arxiv.org/pdf/1606.08390)][[中文解读]()] 245 | 14. [2016][ICML] Dynamic Memory Networks for Visual and Textual Question Answering.[[paper](http://proceedings.mlr.press/v48/xiong16.pdf)][[中文解读]()] 246 | 15. [2016][NIPS] Hierarchical Question-Image Co-Attention for Visual Question Answering.[[paper](https://arxiv.org/pdf/1606.00061.pdf%20)][[中文解读]()] 247 | 16. [2016][NIPS] Multimodal Residual Learning for Visual QA.[[paper](https://arxiv.org/pdf/1606.01455)][[中文解读]()] 248 | 249 | ### 2015 Papers 250 | 1. [2015][CVPR] VisKE, Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases.[[paper](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Sadeghi_VisKE_Visual_Knowledge_2015_CVPR_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)] 251 | 2. [2015][ICCV] Ask Your Neurons A Neural-Based Approach to Answering Questions About Images.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Malinowski_Ask_Your_Neurons_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 252 | 3. [2015][ICCV] Visual Madlibs Fill in the Blank Description Generation and Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Yu_Visual_Madlibs_Fill_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 253 | 4. [2015][ICCV] VQA, Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_iccv_2015/papers/Antol_VQA_Visual_Question_ICCV_2015_paper.pdf)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113420551)] 254 | 5. [2015][NIPS] Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering.[[paper](https://arxiv.org/pdf/1505.05612)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 255 | 6. [2015][NIPS] Exploring Models and Data for Image Question Answering.[[paper](https://arxiv.org/pdf/1505.02074)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113496153)] 256 | 257 | ### 2014 Papers 258 | 1. [2014][NIPS] A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input.[[paper](https://arxiv.org/pdf/1410.0210)][[中文解读](https://blog.csdn.net/ms961516792/article/details/113369883)] 259 | 260 | 261 |

262 | 263 | ## Embodied QA Papers 264 | - 265 |

266 | 267 | ## Knowledge-based VQA Papers 268 | ### Fact-based VQA Papers 269 | - 270 | ### Open-domain Knowledge-based VQA Papers 271 | - 272 |

273 | 274 | ## Interative QA Papers 275 | - 276 |

277 | 278 | ## Image-Set VQA Papers 279 | - 280 |

281 | 282 | ## Inverse VQA Papers 283 | - 284 |

285 | 286 | ## Text-based VQA Papers 287 | ### Data Visualization QA Papers 288 | - 289 | ### Textbook QA Papers 290 | - 291 | ### TextVQA Papers 292 | - 293 |

294 | 295 | ## Visual Reasoning Papers 296 | - 297 |

298 | 299 | ## Video QA Papers 300 | ### Datasets 301 | - 302 | 303 | ### 2021 Papers 304 | 1. [2021][CVPR] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xu_SUTD-TrafficQA_A_Question_Answering_Benchmark_and_an_Efficient_Network_for_CVPR_2021_paper.pdf)] 305 | 2. [2021][CVPR] Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Park_Bridge_To_Answer_Structure-Aware_Graph_Interaction_Network_for_Video_Question_CVPR_2021_paper.pdf)] 306 | 3. [2021][CVPR] NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions.[[paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Xiao_NExT-QA_Next_Phase_of_Question-Answering_to_Explaining_Temporal_Actions_CVPR_2021_paper.pdf)] 307 | 4. [2021][ICCV] On the hidden treasure of dialog in video question answering.[[paper](https://arxiv.org/pdf/2103.14517.pdf)] 308 | 5. [2021][ICCV] Just Ask: Learning to Answer Questions from Millions of Narrated Videos.[[paper](https://arxiv.org/pdf/2012.00451.pdf)] 309 | 6. [2021][ICCV] Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Gao_Env-QA_A_Video_Question_Answering_Benchmark_for_Comprehensive_Understanding_of_ICCV_2021_paper.pdf)] 310 | 7. [2021][ICCV] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_HAIR_Hierarchical_Visual-Semantic_Relational_Reasoning_for_Video_Question_Answering_ICCV_2021_paper.pdf)] 311 | 8. [2021][ICCV] Pano-AVQA: Grounded Audio-Visual Question Answering on 360deg Videos.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Yun_Pano-AVQA_Grounded_Audio-Visual_Question_Answering_on_360deg_Videos_ICCV_2021_paper.pdf)] 312 | 9. [2021][ICCV] Video Question Answering Using Language-Guided Deep Compressed-Domain Video Feature.[[paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Kim_Video_Question_Answering_Using_Language-Guided_Deep_Compressed-Domain_Video_Feature_ICCV_2021_paper.pdf)] 313 | 10. [2021][CVPR] .[[paper]()] 314 | 315 | ### 2020 Papers 316 | 1. [2020][CVPR] Hierarchical Conditional Relation Networks for Video Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Le_Hierarchical_Conditional_Relation_Networks_for_Video_Question_Answering_CVPR_2020_paper.pdf)] 317 | 2. [2020][CVPR] Modality Shifting Attention Network for Multi-Modal Video Question Answering.[[paper](http://openaccess.thecvf.com/content_CVPR_2020/papers/Kim_Modality_Shifting_Attention_Network_for_Multi-Modal_Video_Question_Answering_CVPR_2020_paper.pdf)] 318 | 3. [2020][ECCV][poster] Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions.[[paper](https://arxiv.org/pdf/2007.08751)] 319 | 4. [2020][TIP] Open-Ended Video Question Answering via Multi-Modal Conditional Adversarial Networks.[[paper](https://ieeexplore.ieee.org/abstract/document/8974594/)] 320 | 5. [2020][WACV] BERT Representations for Video Question Answering.[[paper](http://openaccess.thecvf.com/content_WACV_2020/papers/Yang_BERT_representations_for_Video_Question_Answering_WACV_2020_paper.pdf)] 321 | 322 | ### 2019 Papers 323 | 1. [2019][AAAI] Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4887/4760)] 324 | 2. [2019][AAAI] Structured Two-stream Attention Network for Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/4602/4480)] 325 | 3. [2019][ACMMM] Learnable Aggregating Net with Divergent Loss for VideoQA.[[paper](https://dl.acm.org/doi/10.1145/3343031.3350971)] 326 | 4. [2019][ACMMM] Multi-interaction Network with Object Relation for VideoQA.[[paper](https://dl.acm.org/citation.cfm?id=3351065)] 327 | 5. [2019][ACMMM] Question-Aware Tube-Switch Network for VideoQA.[[paper](https://dl.acm.org/citation.cfm?id=3350969)] 328 | 6. [2019][CVPR] Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Heterogeneous_Memory_Enhanced_Multimodal_Attention_Model_for_Video_Question_Answering_CVPR_2019_paper.pdf)] 329 | 7. [2019][CVPR] Progressive Attention Memory Network for Movie Story Question Answering.[[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Progressive_Attention_Memory_Network_for_Movie_Story_Question_Answering_CVPR_2019_paper.pdf)] 330 | 8. [2019][ICCV] SegEQA Video Segmentation based Visual Attention for Embodied Question Answering.[[paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Luo_SegEQA_Video_Segmentation_Based_Visual_Attention_for_Embodied_Question_Answering_ICCV_2019_paper.pdf)] 331 | 9. [2019][IJCAI] Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks.[[paper](https://arxiv.org/pdf/1906.12158)] 332 | 10. [2019][IJCNN] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering.[[paper](https://arxiv.org/pdf/1905.13540)] 333 | 11. [2019][TIP] Compositional Attention Networks With Two-Stream Fusion for Video Question Answering.[[paper](https://ieeexplore.ieee.org/abstract/document/8839734/)] 334 | 12. [2019][TIP] Holistic Multi-modal Memory Network for Movie Question Answering.[[paper](https://arxiv.org/pdf/1811.04595)] 335 | 336 | ### 2018 Papers 337 | 1. [2018][ACMMM] Explore Multi-Step Reasoning in Video Question Answering.[[paper](https://dl.acm.org/doi/abs/10.1145/3240508.3240563)] 338 | 2. [2018][CVPR] Motion-Appearance Co-Memory Networks for Video Question Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gao_Motion-Appearance_Co-Memory_Networks_CVPR_2018_paper.pdf)] 339 | 3. [2018][ECCV] A Joint Sequence Fusion Model for Video Question Answering and Retrieval.[[paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Youngjae_Yu_A_Joint_Sequence_ECCV_2018_paper.pdf)] 340 | 4. [2018][ECCV] Multimodal Dual Attention Memory for Video Story Question Answering.[[paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Kyungmin_Kim_Multimodal_Dual_Attention_ECCV_2018_paper.pdf)] 341 | 5. [2018][EMNLP] TVQA, Localized, Compositional Video Question Answering.[[paper](https://arxiv.org/pdf/1809.01696)] 342 | 343 | ### 2017 Papers 344 | 1. [2017][AAAI] Leveraging Video Descriptions to Learn Video Question Answering.[[paper](https://ojs.aaai.org/index.php/AAAI/article/download/11238/11097)] 345 | 2. [2017][ACMMM] VideoQA via Gradually Refined Attention over Appearance and Motion.[[paper](https://www.comp.nus.edu.sg/~xiangnan/papers/mm17-videoQA.pdf)] 346 | 3. [2017][ACMMM] VideoQA via Hierarchical Dual-Level Attention Network Learning.[[paper](https://dl.acm.org/doi/abs/10.1145/3123266.3123364)] 347 | 4. [2017][CVPR] A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Maharaj_A_Dataset_and_CVPR_2017_paper.pdf)] 348 | 5. [2017][CVPR] End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_End-To-End_Concept_Word_CVPR_2017_paper.pdf)] 349 | 6. [2017][CVPR] TGIF-QA Toward Spatio-Temporal Reasoning in Visual Question Answering.[[paper](http://openaccess.thecvf.com/content_cvpr_2017/papers/Jang_TGIF-QA_Toward_Spatio-Temporal_CVPR_2017_paper.pdf)] 350 | 7. [2017][ICCV] MarioQA Answering Questions by Watching Gameplay Videos.[[paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Mun_MarioQA_Answering_Questions_ICCV_2017_paper.pdf)] 351 | 8. [2017][ICCV] Video Fill In the Blank using LRRL LSTMs with Spatial-Temporal Attentions.[[paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Mazaheri_Video_Fill_in_ICCV_2017_paper.pdf)] 352 | 9. [2017][IJCAI] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.[[paper](https://www.ijcai.org/Proceedings/2017/0492.pdf)] 353 | 10. [2017][SIGIR] Video Question Answering via Attributed-Augmented Attention Network Learning.[[paper](https://arxiv.org/pdf/1707.06355)] 354 | 355 | ### 2016 Papers 356 | 1. [2016][CVPR] MovieQA, Understanding Stories in Movies through Question-Answering.[[paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Tapaswi_MovieQA_Understanding_Stories_CVPR_2016_paper.pdf)] 357 | 358 | ### 2015 Papers 359 | 1. [2015][arXiv] Uncovering the temporal context for video question and answering.[[paper](https://arxiv.org/pdf/1511.04670)] 360 | 361 | ### 2014 Papers 362 | 1. [2014][ACMMM] Joint video and text parsing for understanding events and answering queries.[[paper](https://arxiv.org/pdf/1308.6628)] 363 | -------------------------------------------------------------------------------- /img/VQA.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NeverMoreLCH/Awesome-VQA/abae36c652e2463c5aa91f7812d1966172ad7cb8/img/VQA.jpg --------------------------------------------------------------------------------