└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Multimodality 2 | Papers, Datasets, Codes about Multimodality 3 | 4 | ## Paper 5 | 6 | ### Vision-Language Pre-Training 7 | 1. **VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts** *Wenhui Wang, Hangbo Bao, Li Dong, Furu Wei* [[pdf]](https://arxiv.org/pdf/2111.02358.pdf) 8 | 2. **Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts** *Yan Zeng, Xinsong Zhang, Hang Li* [[pdf]](https://arxiv.org/pdf/2111.08276.pdf) 9 | 3. **Masked Autoencoders Are Scalable Vision Learners** *Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick* [[pdf]](https://arxiv.org/pdf/2111.06377.pdf) 10 | 11 | ### Vision-and-Language Navigation 12 | 1. 13 | 14 | ### Dialogue-System 15 | 1. **Multi-Modal Open-Domain Dialogue** *Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston* [[pdf]](https://arxiv.org/pdf/2010.01082.pdf) 16 | 2. **Multimodal Dialogue Response Generation** *Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang* [[pdf]](https://arxiv.org/pdf/2110.08515.pdf) 17 | 3. **Reason first, then respond:Modular Generation for Knowledge-infused Dialogue** *Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston* [[pdf]](https://arxiv.org/pdf/2111.05204.pdf) 18 | 19 | ### Prompt 20 | 1. **CPT: COLORFUL PROMPT TUNING FOR PRE-TRAINED VISION-LANGUAGE MODELS** *Yuan Yao, Ao Zhang, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun* [[pdf]](https://arxiv.org/pdf/2109.11797.pdf) 21 | 2. **Multimodal Few-Shot Learning with Frozen Language Models** *Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill* [[pdf]](https://papers.nips.cc/paper/2021/file/01b7575c38dac42f3cfb7d500438b875-Paper.pdf) 22 | 23 | ### TOP Conference Paper About Multi-Modal Dialog 24 | ##### **ACL** 25 | |ID|Paper|Author|Conference| 26 | |-|-|-|-| 27 | |1|[Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering](https://arxiv.org/pdf/2107.02331.pdf)|Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei and Christopher Manning|ACL2021| 28 | |2|[TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems](https://arxiv.org/pdf/2012.12458.pdf)|Bill Byrne, Karthik Krishnamoorthi, Saravanan Ganesh and Mihir Kale|ACL2021| 29 | |3|[PhotoChat: A Human-Human Dialogue Dataset With Photo Sharing Behavior For Joint Image-Text Modeling](https://arxiv.org/pdf/2108.01453.pdf)|Xiaoxue Zang, Lijuan Liu, Maria Wang, Yang Song, Hao Zhang and Jindong Chen|ACL2021| 30 | |4|[Maria: A Visual Experience Powered Conversational Agent](https://arxiv.org/pdf/2105.13073.pdf)|Zujie Liang, Huang Hu, Can Xu, Chongyang Tao, Xiubo Geng, Yining Chen, Fan Liang and Daxin Jiang|ACL2021| 31 | |5|[MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation](https://arxiv.org/pdf/2107.06779.pdf)|Jingwen Hu, Yuchen Liu, Jinming Zhao and Qin Jin|ACL2021| 32 | |6|[How do people talk about images?A study on open-domain conversation on images](https://openreview.net/pdf?id=bRVvxrjkLM)|Anonymous ACL submission|-| 33 | |7|[Zero-Shot Visual Grounding of Referring Utterances in Dialogue](https://openreview.net/pdf?id=JcxhaCjSlGz)|Anonymous ACL submission|-| 34 | |8|[When did you become so smart, oh wise one?! Sarcasm Explanation in Multi-modal Multi-party Dialogues](https://openreview.net/pdf?id=eJUGH5CaCJK)|Anonymous ACL submission|-| 35 | |9|[Tackling Situated Multi-Modal Task-Oriented Dialogs with a Single Transformer Model](https://openreview.net/pdf?id=NajekV9uBas)|Anonymous ACL submission|-| 36 | |10|[Co-VQA : Answering by Interactive Sub Question Sequence](https://openreview.net/pdf?id=8s9M2_HIF-j)|Anonymous ACL submission|-| 37 | 38 | 39 | ## Datasets 40 | |ID|NAME|Description|Paper|Conference| 41 | |:---:|:---:|:---:|:---:|:---:| 42 | | 1 | [LAION-40](https://laion.ai/laion-400-open-dataset/) | Multi-Model | | | 43 | | 2 | [IEMOCAP](https://sail.usc.edu/iemocap/) | Multi-Model emotion | [IEMOCAP: interactive emotional dyadic motion capture database. Lang Resources & Evaluation](https://sail.usc.edu/publications/files/bussolre2008.pdf) | | 44 | | 3 | [MELD](https://affective-meld.github.io/) | Multi-Model emotion | [MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation](https://arxiv.org/pdf/1810.02508.pdf) | | 45 | | 4 | [CH-SIMS](https://drive.google.com/drive/folders/1E5kojBirtd5VbfHsFp6FYWkQunk73Nsv) | Multi-Model emotion | [CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality](https://aclanthology.org/2020.acl-main.343.pdf) | | 46 | | 5 | [SEMAINE](https://semaine-db.eu/DailyDialog) | Multi-Model emotion | [The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent](https://ieeexplore.ieee.org/abstract/document/5959155) | | 47 | | 6 | [COCO](https://cocodataset.org/#download) | Multi-Model Retrieval| [Microsoft COCO Captions Data Collection and Evaluation Server](https://arxiv.org/pdf/1504.00325.pdf) | | 48 | | 7 | [IAPR TC-12](https://www.imageclef.org/photodata) | Multi-Model Retrieval | [The IAPR Benchmark: A New Evaluation Resource for Visual Information Systems](https://www.cs.brandeis.edu/~marc/misc/proceedings/lrec-2006/workshops/W02/RealFinalOntoImage2006-2.pdf#page=13) | LREC | 49 | | 8 | [Conceptual Captions Dataset](https://github.com/google-research-datasets/conceptual-captions) | Multi-Model Retrieval | [Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning](https://aclanthology.org/P18-1238.pdf) | | 50 | | 9 | [OpenViDial](https://github.com/ShannonAI/OpenViDial) | Multi-Model Dialogue| [OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts](https://arxiv.org/pdf/2012.15015.pdf) | | 51 | 52 | 53 | 54 | ## Codes 55 | 56 | --------------------------------------------------------------------------------