└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Multimodal datasets 2 | 3 | This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and 4 | Frontiers". 5 | 6 | As a part of this release we share the information about recent multimodal datasets which are available for research purposes. 7 | 8 | We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains. 9 | 10 | # Multimodal datasets for NLP Applications 11 | 12 | 1. **Sentiment Analysis** 13 | 14 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 15 | | ----------------- | ---------------------- |---------------------- |------------------------ | 16 | | EmoDB| A Database of German Emotional Speech| [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.8506&rep=rep1&type=pdf) | [Dataset](https://www.kaggle.com/piyushagni5/berlin-database-of-emotional-speech-emodb) | 17 | | VAM | The Vera am Mittag German Audio-Visual Emotional Speech Database | [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4607572) | [Dataset](https://sail.usc.edu/VAM/vam_release.htm) | 18 | | IEMOCAP| IEMOCAP: interactive emotional dyadic motion capture database | [Paper](https://link.springer.com/content/pdf/10.1007/s10579-008-9076-6.pdf) | [Dataset](https://sail.usc.edu/software/databases/) | 19 | | Mimicry|A Multimodal Database for Mimicry Analysis| [Paper](https://ibug.doc.ic.ac.uk/media/uploads/documents/sun2011multimodal.pdf) | [Dataset](http://www.mahnob-db.eu/mimicry) | 20 | | YouTube| Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web| [Paper](https://ict.usc.edu/pubs/Towards%20Multimodal%20Sentiment%20Analysis-%20Harvesting%20Opinions%20from%20The%20Web.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 21 | | HUMAINE| The HUMAINE database | [Paper](https://ibug.doc.ic.ac.uk/media/uploads/documents/sun2011multimodal.pdf) | [Dataset](www.emotion-research.net) | 22 | | Large Movies| Sentiment classification on Large Movie Review | [Paper](https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf) | [Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) | 23 | | SEMAINE| The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent | [Paper](https://ieeexplore.ieee.org/document/5959155) | [Dataset](https://semaine-db.eu/)| 24 | | AFEW| Collecting Large, Richly Annotated Facial-Expression Databases from Movies| [Paper](http://users.cecs.anu.edu.au/~adhall/Dhall_Goecke_Lucey_Gedeon_M_2012.pdf) | [Dataset](https://cs.anu.edu.au/few/AFEW.html) | 25 | | SST| Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank |[Paper](https://aclanthology.org/D13-1170.pdf) | [Dataset](https://metatext.io/datasets/the-stanford-sentiment-treebank-(sst)) | 26 | | ICT-MMMO| YouTube Movie Reviews: Sentiment Analysis in an AudioVisual Context| [Paper](http://multicomp.cs.cmu.edu/wp-content/uploads/2017/09/2013_IEEEIS_wollmer_youtube.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 27 | | RECOLA| Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions | [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6553805) | [Dataset](https://diuf.unifr.ch/main/diva/recola/download.html) | 28 | | MOUD| Utterance-Level Multimodal Sentiment Analysis |[Paper](https://aclanthology.org/P13-1096.pdf) | | 29 | | CMU-MOSI| MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos | [Paper](https://arxiv.org/ftp/arxiv/papers/1606/1606.06259.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 30 | | POM| Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia | [Paper](https://dl.acm.org/doi/pdf/10.1145/2897739) | [Dataset](https://github.com/eusip/POM) | 31 | | MELD| MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations | [Paper](https://arxiv.org/pdf/1810.02508.pdf) | [Dataset](https://affective-meld.github.io/) | 32 | | CMU-MOSEI| Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph | [Paper](https://aclanthology.org/P18-1208.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 33 | | AMMER| Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning | [Paper](https://arxiv.org/pdf/1909.02764.pdf) | On Request | 34 | | SEWA| SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild | [Paper](https://arxiv.org/pdf/1901.02839.pdf) | [Dataset](http://www.sewaproject.eu/resources) | 35 | | Fakeddit| r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection |[Paper](https://arxiv.org/pdf/1911.03854.pdf) | [Dataset](https://fakeddit.netlify.app/) | 36 | | CMU-MOSEAS| CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French |[Paper](https://aclanthology.org/2020.emnlp-main.141.pdf) | [Dataset](https://bit.ly/2Svbg9f) | 37 | | MultiOFF| Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text | [Paper](https://aclanthology.org/2020.trac-1.6.pdf) | [Dataset](https://github.com/bharathichezhiyan/Multimodal-Meme-Classification-Identifying-Offensive-Content-in-Image-and-Text) | 38 | | MEISD| MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations | [Paper](https://aclanthology.org/2020.coling-main.393.pdf) | [Dataset](https://github.com/declare-lab/MELD) | 39 | | TASS| Overview of TASS 2020: Introducing Emotion | [Paper](http://ceur-ws.org/Vol-2664/tass_overview.pdf) | [Dataset](http://www.sepln.org/workshops/tass/tass_data/download.php) | 40 | | CH SIMS| CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality | [Paper](https://aclanthology.org/2020.acl-main.343.pdf) | [Dataset](https://github.com/thuiar/MMSA) | 41 | | Creep-Image| A Multimodal Dataset of Images and Text | [Paper](http://ceur-ws.org/Vol-2769/paper_11.pdf) | [Dataset](https://github.com/dhfbk/creep-image-dataset) | 42 | | Entheos| Entheos: A Multimodal Dataset for Studying Enthusiasm | [Paper](https://aclanthology.org/2021.findings-acl.180.pdf) | [Dataset](https://github.com/clviegas/Entheos-Dataset) | 43 | 44 | 45 | 2. **Machine Translation** 46 | 47 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 48 | | ----------------- | ---------------------- |---------------------- |------------------------ | 49 | | Multi30K| Multi30K: Multilingual English-German Image Description | [Paper](https://arxiv.org/pdf/1605.00459.pdf) | [Dataset](https://github.com/multi30k/dataset) | 50 | | How2| How2: A Large-scale Dataset for Multimodal Language Understanding | [Paper](https://arxiv.org/pdf/1811.00347.pdf) | [Dataset](https://github.com/srvk/how2-dataset) | 51 | | MLT | Multimodal Lexical Translation | [Paper](https://aclanthology.org/L18-1602.pdf) | [Dataset](https://github.com/sheffieldnlp/mlt) | 52 | | IKEA | A Visual Attention Grounding Neural Model for Multimodal Machine Translation | [Paper](https://arxiv.org/pdf/1808.08266.pdf) | [Dataset](https://github.com/sampalomad/IKEA-Dataset) | 53 | | Flickr30K (EN- (hi-IN)) | Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data | [Paper](https://aclanthology.org/W18-3405.pdf) | On Request | 54 | | Hindi Visual Genome | Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation | [Paper](https://arxiv.org/pdf/1907.08948.pdf) | [Dataset](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2997) | 55 | | HowTo100M | Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models | [Paper](https://arxiv.org/pdf/2103.08849.pdf) | [Dataset](https://github.com/berniebear/Multi-HT100M) | 56 | 57 | 58 | 3. **Information Retrieval** 59 | 60 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 61 | | ----------------- | ---------------------- |---------------------- |------------------------ | 62 | | MUSICLEF | MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval | [Paper](https://ismir2011.ismir.net/papers/OS6-3.pdf) | [Dataset](http://www.cp.jku.at/datasets/musiclef/index.html) | 63 | | Moodo | The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval | [Paper](https://www.tandfonline.com/doi/pdf/10.1080/09298215.2017.1333518?casa_token=GxB97r7M-WMAAAAA:7ZfS-mY7f3WTP0FBbpiaSIdU-tcRXdIIwCiLLCG8ghkw_FTRn_Ha3cPD7s_6i29RwLBd6EPJmg) | [Dataset](http://moodo.musiclab.si) | 64 | | ALF-200k | ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists | [Paper](https://dbis-informatik.uibk.ac.at/sites/default/files/2018-04/ecir-2018-alf.pdf) | [Dataset](https://github.com/dbis-uibk/ALF200k) | 65 | | MQA | Can Image Captioning Help Passage Retrieval in Multimodal Question Answering? | [Paper](https://www.springerprofessional.de/en/can-image-captioning-help-passage-retrieval-in-multimodal-questi/16626696) | [Dataset](https://huggingface.co/datasets/clips/mqa) | 66 | | WAT2019 | WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset | [Paper](https://github.com/sheffieldnlp/mlt) | [Dataset](https://aclanthology.org/L18-1602.pdf) | 67 | | ViTT | Multimodal Pretraining for Dense Video Captioning | [Paper](https://arxiv.org/pdf/2011.11760.pdf) | [Dataset](https://github.com/google-research-datasets/Video-Timeline-Tags-ViTT) | 68 | | MTD | MTD: A Multimodal Dataset of Musical Themes for MIR Research | [Paper](https://transactions.ismir.net/articles/10.5334/tismir.68/) | [Dataset](https://www.audiolabs-erlangen.de/resources/MIR/MTD) | 69 | | MusiClef | A professionally annotated and enriched multimodal data set on popular music| [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.302.2718&rep=rep1&type=pdf) | [Dataset](http://www.cp.jku.at/datasets/musiclef/index.html) | 70 | | Schubert Winterreise | Schubert Winterreise dataset: A multimodal scenario for music analysis | [Paper](https://dl.acm.org/doi/pdf/10.1145/3429743) | [Dataset](https://zenodo.org/record/3968389#.YcQrk2hBxPY) | 71 | | WIT | WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning | [Paper](https://arxiv.org/pdf/2103.01913.pdf) | [Dataset](https://github.com/google-research-datasets/wit) | 72 | 73 | 4. **Question Answering** 74 | 75 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 76 | | ----------------- | ---------------------- |---------------------- |------------------------ | 77 | | MQA | A Dataset for Multimodal Question Answering in the Cultural Heritage Domain | [Paper](https://aclanthology.org/W16-4003.pdf) | - | 78 | | MovieQA |Movieqa: Understanding stories in movies through question-answering MovieQA | [Paper](https://arxiv.org/pdf/1512.02902.pdf) | [Dataset](https://github.com/makarandtapaswi/MovieQA_CVPR2016) | 79 | | PororoQA |Deep story video story qa by deep embedded memory networks| [Paper](https://arxiv.org/ftp/arxiv/papers/1707/1707.00836.pdf) | [Dataset](https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks) | 80 | | MemexQA |MemexQA: Visual Memex Question Answering | [Paper](https://arxiv.org/pdf/1708.01336.pdf) | [Dataset](https://memexqa.cs.cmu.edu/) | 81 | | VQA |Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering | [Paper](https://arxiv.org/pdf/1612.00837.pdf) | [Dataset](https://visualqa.org/)| 82 | | TDIUC |An analysis of visual question answering algorithms | [Paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Kafle_An_Analysis_of_ICCV_2017_paper.pdf) | [Dataset](https://kushalkafle.com/projects/tdiuc.html) | 83 | | TGIF-QA |TGIF-QA: Toward spatio-temporal reasoning in visual question answering | [Paper](https://arxiv.org/pdf/1704.04497.pdf) | [Dataset](https://github.com/YunseokJANG/tgif-qa) | 84 | | MSVD QA, MSRVTT QA |Video question answering via attribute augmented attention network learning | [Paper](https://arxiv.org/pdf/1707.06355.pdf) | [Dataset](https://github.com/xudejing/video-question-answering) | 85 | | YouTube2Text |Video Question Answering via Gradually Refined Attention over Appearance and Motion | [Paper](http://staff.ustc.edu.cn/~hexn/papers/mm17-videoQA.pdf) | [Dataset](https://github.com/topics/youtube2text) | 86 | | MovieFIB |A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering | [Paper](https://arxiv.org/pdf/1611.07810.pdf) | [Dataset](https://github.com/teganmaharaj/MovieFIB/blob/master/README.md) | 87 | | Video Context QA |Uncovering the temporal context for video question answering | [Paper](https://arxiv.org/pdf/1511.04670.pdf) | [Dataset](https://github.com/ffmpbgrnn/VideoQA) | 88 | | MarioQA |Marioqa: Answering questions by watching gameplay videos | [Paper](https://arxiv.org/pdf/1612.01669.pdf) | [Dataset](https://github.com/JonghwanMun/MarioQA) | 89 | | TVQA |Tvqa: Localized, compositional video question answering | [Paper](https://arxiv.org/pdf/1809.01696.pdf) | [Dataset](https://tvqa.cs.unc.edu/) | 90 | | VQA-CP v2 |Don’t just assume; look and answer: Overcoming priors for visual question answering | [Paper](https://arxiv.org/pdf/1712.00377.pdf) | [Dataset](https://github.com/cdancette/vqa-cp-leaderboard) | 91 | | RecipeQA | RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes| [Paper](https://arxiv.org/pdf/1809.00812.pdf) | [Dataset](https://hucvl.github.io/recipeqa/) | 92 | | GQA |GQA: A new dataset for real-world visual reasoning and compositional question answering | [Paper](https://arxiv.org/pdf/1902.09506v3.pdf) | [Dataset](https://github.com/leaderj1001/Vision-Language) | 93 | | Social IQ |Social-iq: A question answering benchmark for artificial social intelligence | [Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zadeh_Social-IQ_A_Question_Answering_Benchmark_for_Artificial_Social_Intelligence_CVPR_2019_paper.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 94 | | MIMOQA |MIMOQA: Multimodal Input Multimodal Output Question Answering | [Paper](https://aclanthology.org/2021.naacl-mai) | - | 95 | 96 | 97 | 5. **Summarization** 98 | 99 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 100 | | ----------------- | ---------------------- |---------------------- |------------------------ | 101 | | SumMe |Tvsum: Summarizing web videos using titles|[Paper](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Song_TVSum_Summarizing_Web_2015_CVPR_paper.pdf) | [Dataset](https://github.com/yalesong/tvsum) | 102 | | TVSum |Creating summaries from user videos|[Paper](https://gyglim.github.io/me/papers/GygliECCV14_vsum.pdf) | [Dataset](https://gyglim.github.io/me/vsum/index.html)| 103 | | QFVS |Query-focused video summarization: Dataset, evaluation, and a memory network based approach | [Paper](https://arxiv.org/abs/1707.04960) | [Dataset](https://www.aidean-sharghi.com/cvpr2017) | 104 | | MMSS |Multi-modal Sentence Summarization with Modality Attention and Image Filtering | [Paper](https://www.ijcai.org/proceedings/2018/0577.pdf) | - | 105 | | MSMO |MSMO: Multimodal Summarization with Multimodal Output|[Paper](https://aclanthology.org/D18-1448.pdf) | - | 106 | | Screen2Words |Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning | [Paper](https://arxiv.org/pdf/2108.03353.pdf) | [Dataset](https://github.com/google-research-datasets/screen2words) | 107 | | AVIATE | IEMOCAP: interactive emotional dyadic motion capture database |[Paper](https://link.springer.com/content/pdf/10.1007/s10579-008-9076-6.pdf) | [Dataset](https://sail.usc.edu/software/databases/) | 108 | | Multimodal Microblog Summarizaion |On Multimodal Microblog Summarization|[Paper](https://ieeexplore.ieee.org/document/9585070) | - | 109 | 110 | 111 | 6. **Human Computer Interaction** 112 | 113 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 114 | | ----------------- | ---------------------- |---------------------- |------------------------ | 115 | | CAUVE |CUAVE: A new audio-visual database for multimodal human-computer interface research | [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5745028) |[Dataset](http://people.csail.mit.edu/siracusa/avdata/)| 116 | | MHAD |Berkeley mhad: A comprehensive multimodal human action database | [Paper](https://ieeexplore.ieee.org/document/6474999) |[Dataset](https://tele-immersion.citris-uc.org/berkeley_mhad)| 117 | | Multi-party interactions |A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction | [Paper](https://aclanthology.org/L16-1703.pdf) | - | 118 | | MHHRI |Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement | [Paper](https://ieeexplore.ieee.org/document/8003432)|[Dataset]( https://www.cl.cam.ac.uk/research/rainbow/projects/mhhri/) | 119 | | Red Hen Lab |Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research | [Paper](https://link.springer.com/content/pdf/10.1007/s13218-017-0505-9.pdf) | - | 120 | | EMRE | Generating a Novel Dataset of Multimodal Referring Expressions |[Paper](https://aclanthology.org/W19-0507.pdf) | [Dataset](https://github.com/VoxML/public-data/tree/master/EMRE/HIT) | 121 | | Chinese Whispers |Chinese whispers: A multimodal dataset for embodied language grounding | [Paper](https://www.researchgate.net/publication/341294259_Chinese_Whispers_A_Multimodal_Dataset_for_Embodied_Language_Grounding) |[Dataset](https://zenodo.org/record/4587308#.YbJEctBBxPZ) | 122 | | uulmMAC |The uulmMAC database—A multimodal affective corpus for affective computing in human-computer interaction | [Paper](https://www.mdpi.com/1424-8220/20/8/2308) |[Dataset](https://neuro.informatik.uni-ulm.de/TC9/tools-and-data-sets/uulmmac-database/) | 123 | 124 | 7. **Semantic Analysis** 125 | 126 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 127 | | ----------------- | ---------------------- |---------------------- |------------------------ | 128 | | WN9-IMG |Image-embodied Knowledge Representation Learning | [Paper](https://www.ijcai.org/proceedings/2017/0438.pdf) |[Dataset](https://github.com/xrb92/IKRL) | 129 | | Wikimedia Commons |A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions | [Paper](https://aclanthology.org/W18-1814.pdf) |[Dataset](https://commons.wikimedia.org/wiki/Main_Page) | 130 | | Starsem18-multimodalKB |A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning | [Paper](https://aclanthology.org/S18-2027.pdf) |[Dataset]( https://github.com/UKPLab/starsem18-multimodalKB) | 131 | | MUStARD |Towards Multimodal Sarcasm Detection | [Paper](https://arxiv.org/pdf/1906.01815.pdf) |[Dataset](https://github.com/soujanyaporia/MUStARD) | 132 | | YouMakeup |YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | [Paper](https://aclanthology.org/D19-1517.pdf) |[Dataset]( https://github.com/AIM3-RUC/YouMakeup) | 133 | | MDID |Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts | [Paper](https://arxiv.org/pdf/1904.09073.pdf) |[Dataset]( https://github.com/karansikka1/documentIntent_emnlp19) | 134 | | Social media posts from Flickr (Mental Health) |Inferring Social Media Users’ Mental Health Status from Multimodal Information | [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147779/) |[Dataset](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147779/) | 135 | | Twitter MEL |Building a Multimodal Entity Linking Dataset From Tweets Building a Multimodal Entity Linking Dataset From Tweets | [Paper](aclanthology.org) |[Dataset]( https://github.com/OA256864/MEL_Tweets) | 136 | | MultiMET | MultiMET: A Multimodal Dataset for Metaphor Understanding |[Paper](https://aclanthology.org/2021.acl-long.249.pdf) | - | 137 | | MSDS |Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline |[Paper](https://arxiv.org/pdf/2105.05542.pdf) |[Dataset](https://zenodo.org/record/4701383) | 138 | 139 | 8. **Miscellaneous** 140 | 141 | | **Dataset** | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** | 142 | | ----------------- | ---------------------- |---------------------- |------------------------ | 143 | | MS COCO |Microsoft COCO: Common objects in context |[Paper](https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48) | [Dataset](https://github.com/topics/mscoco-dataset) | 144 | | ILSVRC |ImageNet Large Scale Visual Recognition Challenge |[Paper](https://arxiv.org/pdf/1409.0575.pdf) | [Dataset](https://image-net.org/download.php) | 145 | | YFCC100M |YFCC100M: The new data in multimedia research |[Paper](https://arxiv.org/pdf/1503.01817.pdf) | [Dataset](https://github.com/chi0tzp/YFCC100M-Downloader) | 146 | | COGNIMUSE | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |[Paper](https://jivp-eurasipjournals.springeropen.com/track/pdf/10.1186/s13640-017-0194-1.pdf) | [Dataset](https://cognimuse.cs.ntua.gr/research_datasets) | 147 | | SNAG |SNAG: Spoken Narratives and Gaze Dataset |[Paper](https://aclanthology.org/P18-2022.pdf) | [Dataset](https://mvrl-clasp.github.io/SNAG/) | 148 | | UR-Funny |UR-FUNNY: A Multimodal Language Dataset for Understanding Humor |[Paper](https://arxiv.org/pdf/1904.06618.pdf) | [Dataset](https://github.com/ROC-HCI/UR-FUNNY/blob/master/UR-FUNNY-V1.md)| 149 | | Bag-of-Lies |Bag-of-Lies: A Multimodal Dataset for Deception Detection |[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9025340) | [Dataset](http://iab-rubric.org/resources/BagLies.html) | 150 | | MARC |A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks |[Paper](https://aclanthology.org/2020.acl-main.440.pdf) | [Dataset](https://github.com/microsoft/multimodal-aligned-recipe-corpus)| 151 | | MuSE |MuSE: a Multimodal Dataset of Stressed Emotion |[Paper](https://aclanthology.org/2020.lrec-1.187.pdf) | [Dataset](http://lit.eecs.umich.edu/downloads.html) | 152 | | BabelPic |Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concept |[Paper](https://aclanthology.org/2020.acl-main.425.pdf) | [Dataset](https://sapienzanlp.github.io/babelpic/) | 153 | | Eye4Ref |Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations |[Paper](https://aclanthology.org/2020.lrec-1.292.pdf) | - | 154 | | Troll Memes | A Dataset for Troll Classification of TamilMemes|[Paper](https://aclanthology.org/2020.wildre-1.2.pdf) | [Dataset](https://github.com/sharduls007/TamilMemes) | 155 | | SEMD |EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system |[Paper](https://www.computer.org/csdl/journal/ta/5555/01/09165162/1mcQTrYsXbG) | - | 156 | | Chat talk Corpus |Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness |[Paper](https://aclanthology.org/2020.lrec-1.56.pdf) | - | 157 | | EMOTyDA |Towards Emotion-aided Multi-modal Dialogue Act Classification |[Paper](https://aclanthology.org/2020.acl-main.402.pdf) | [Dataset](https://github.com/sahatulika15/EMOTyDA) | 158 | | MELINDA |MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification |[Paper](https://arxiv.org/pdf/2012.09216.pdf) | [Dataset](https://github.com/PlusLabNLP/melinda)| 159 | | NewsCLIPpings |NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media|[Paper](https://aclanthology.org/2021.emnlp-main.545.pdf) | [Dataset](https://github.com/g-luo/news_clippings) | 160 | | R2VQ | Designing Multimodal Datasets for NLP Challenges |[Paper](https://arxiv.org/pdf/2105.05999.pdf) | [Dataset](https://competitions.codalab.org/competitions/34056) | 161 | | M2H2 | M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations |[Paper](https://arxiv.org/pdf/2108.01260.pdf) | [Dataset](https://github.com/declare-lab/M2H2-dataset) | 162 | --------------------------------------------------------------------------------