└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Multimodal datasets
  2 | 
  3 | This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and
  4 | Frontiers".
  5 | 
  6 | As a part of this release we share the information about recent multimodal datasets which are available for research purposes.
  7 | 
  8 | We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.
  9 | 
 10 | # Multimodal datasets for NLP Applications
 11 | 
 12 | 1. **Sentiment Analysis**
 13 | 
 14 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
 15 | | ----------------- | ---------------------- |---------------------- |------------------------ |
 16 | | EmoDB| A Database of German Emotional Speech| [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.8506&rep=rep1&type=pdf) | [Dataset](https://www.kaggle.com/piyushagni5/berlin-database-of-emotional-speech-emodb) | 
 17 | | VAM | The Vera am Mittag German Audio-Visual Emotional Speech Database | [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4607572)  | [Dataset](https://sail.usc.edu/VAM/vam_release.htm) | 
 18 | | IEMOCAP| IEMOCAP: interactive emotional dyadic motion capture database | [Paper](https://link.springer.com/content/pdf/10.1007/s10579-008-9076-6.pdf) | [Dataset](https://sail.usc.edu/software/databases/) | 
 19 | | Mimicry|A Multimodal Database for Mimicry Analysis| [Paper](https://ibug.doc.ic.ac.uk/media/uploads/documents/sun2011multimodal.pdf) | [Dataset](http://www.mahnob-db.eu/mimicry)  | 
 20 | | YouTube| Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web| [Paper](https://ict.usc.edu/pubs/Towards%20Multimodal%20Sentiment%20Analysis-%20Harvesting%20Opinions%20from%20The%20Web.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 
 21 | | HUMAINE| The HUMAINE database | [Paper](https://ibug.doc.ic.ac.uk/media/uploads/documents/sun2011multimodal.pdf) | [Dataset](www.emotion-research.net) | 
 22 | | Large Movies| Sentiment classification on Large Movie Review | [Paper](https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf) | [Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) | 
 23 | | SEMAINE| The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent | [Paper](https://ieeexplore.ieee.org/document/5959155) | [Dataset](https://semaine-db.eu/)| 
 24 | | AFEW| Collecting Large, Richly Annotated Facial-Expression Databases from Movies| [Paper](http://users.cecs.anu.edu.au/~adhall/Dhall_Goecke_Lucey_Gedeon_M_2012.pdf) | [Dataset](https://cs.anu.edu.au/few/AFEW.html) | 
 25 | | SST| Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank |[Paper](https://aclanthology.org/D13-1170.pdf) | [Dataset](https://metatext.io/datasets/the-stanford-sentiment-treebank-(sst)) | 
 26 | | ICT-MMMO| YouTube Movie Reviews: Sentiment Analysis in an AudioVisual Context| [Paper](http://multicomp.cs.cmu.edu/wp-content/uploads/2017/09/2013_IEEEIS_wollmer_youtube.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 
 27 | | RECOLA| Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions | [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6553805) | [Dataset](https://diuf.unifr.ch/main/diva/recola/download.html) | 
 28 | | MOUD| Utterance-Level Multimodal Sentiment Analysis |[Paper](https://aclanthology.org/P13-1096.pdf) |  | 
 29 | | CMU-MOSI| MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos | [Paper](https://arxiv.org/ftp/arxiv/papers/1606/1606.06259.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 
 30 | | POM| Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia | [Paper](https://dl.acm.org/doi/pdf/10.1145/2897739) | [Dataset](https://github.com/eusip/POM) | 
 31 | | MELD| MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations | [Paper](https://arxiv.org/pdf/1810.02508.pdf) | [Dataset](https://affective-meld.github.io/) | 
 32 | | CMU-MOSEI| Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph | [Paper](https://aclanthology.org/P18-1208.pdf) | [Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 
 33 | | AMMER| Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning | [Paper](https://arxiv.org/pdf/1909.02764.pdf) | On Request | 
 34 | | SEWA| SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild | [Paper](https://arxiv.org/pdf/1901.02839.pdf) | [Dataset](http://www.sewaproject.eu/resources) | 
 35 | | Fakeddit| r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection |[Paper](https://arxiv.org/pdf/1911.03854.pdf) | [Dataset](https://fakeddit.netlify.app/) | 
 36 | | CMU-MOSEAS| CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French |[Paper](https://aclanthology.org/2020.emnlp-main.141.pdf) | [Dataset](https://bit.ly/2Svbg9f) | 
 37 | | MultiOFF| Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text | [Paper](https://aclanthology.org/2020.trac-1.6.pdf) | [Dataset](https://github.com/bharathichezhiyan/Multimodal-Meme-Classification-Identifying-Offensive-Content-in-Image-and-Text) | 
 38 | | MEISD| MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations | [Paper](https://aclanthology.org/2020.coling-main.393.pdf) | [Dataset](https://github.com/declare-lab/MELD) | 
 39 | | TASS| Overview of TASS 2020: Introducing Emotion  | [Paper](http://ceur-ws.org/Vol-2664/tass_overview.pdf) | [Dataset](http://www.sepln.org/workshops/tass/tass_data/download.php) | 
 40 | | CH SIMS| CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality | [Paper](https://aclanthology.org/2020.acl-main.343.pdf) | [Dataset](https://github.com/thuiar/MMSA) | 
 41 | | Creep-Image| A Multimodal Dataset of Images and Text | [Paper](http://ceur-ws.org/Vol-2769/paper_11.pdf) | [Dataset](https://github.com/dhfbk/creep-image-dataset) | 
 42 | | Entheos| Entheos: A Multimodal Dataset for Studying Enthusiasm | [Paper](https://aclanthology.org/2021.findings-acl.180.pdf) | [Dataset](https://github.com/clviegas/Entheos-Dataset) | 
 43 |  
 44 | 
 45 | 2. **Machine Translation**
 46 | 
 47 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
 48 | | ----------------- | ---------------------- |---------------------- |------------------------ |
 49 | | Multi30K| Multi30K: Multilingual English-German Image Description | [Paper](https://arxiv.org/pdf/1605.00459.pdf) | [Dataset](https://github.com/multi30k/dataset) | 
 50 | | How2| How2: A Large-scale Dataset for Multimodal Language Understanding | [Paper](https://arxiv.org/pdf/1811.00347.pdf) | [Dataset](https://github.com/srvk/how2-dataset) | 
 51 | | MLT | Multimodal Lexical Translation | [Paper](https://aclanthology.org/L18-1602.pdf) | [Dataset](https://github.com/sheffieldnlp/mlt) | 
 52 | | IKEA | A Visual Attention Grounding Neural Model for Multimodal Machine Translation | [Paper](https://arxiv.org/pdf/1808.08266.pdf) | [Dataset](https://github.com/sampalomad/IKEA-Dataset) | 
 53 | | Flickr30K (EN- (hi-IN)) | Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data | [Paper](https://aclanthology.org/W18-3405.pdf) | On Request | 
 54 | | Hindi Visual Genome | Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation | [Paper](https://arxiv.org/pdf/1907.08948.pdf) | [Dataset](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2997) | 
 55 | | HowTo100M | Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models | [Paper](https://arxiv.org/pdf/2103.08849.pdf) | [Dataset](https://github.com/berniebear/Multi-HT100M) | 
 56 | 
 57 | 
 58 | 3. **Information Retrieval**
 59 | 
 60 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
 61 | | ----------------- | ---------------------- |---------------------- |------------------------ | 
 62 | | MUSICLEF | MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval | [Paper](https://ismir2011.ismir.net/papers/OS6-3.pdf) | [Dataset](http://www.cp.jku.at/datasets/musiclef/index.html) | 
 63 | | Moodo | The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval | [Paper](https://www.tandfonline.com/doi/pdf/10.1080/09298215.2017.1333518?casa_token=GxB97r7M-WMAAAAA:7ZfS-mY7f3WTP0FBbpiaSIdU-tcRXdIIwCiLLCG8ghkw_FTRn_Ha3cPD7s_6i29RwLBd6EPJmg) | [Dataset](http://moodo.musiclab.si) | 
 64 | | ALF-200k | ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists | [Paper](https://dbis-informatik.uibk.ac.at/sites/default/files/2018-04/ecir-2018-alf.pdf) | [Dataset](https://github.com/dbis-uibk/ALF200k) | 
 65 | | MQA | Can Image Captioning Help Passage Retrieval in Multimodal Question Answering? | [Paper](https://www.springerprofessional.de/en/can-image-captioning-help-passage-retrieval-in-multimodal-questi/16626696) | [Dataset](https://huggingface.co/datasets/clips/mqa) | 
 66 | | WAT2019 | WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset | [Paper](https://github.com/sheffieldnlp/mlt)	| [Dataset](https://aclanthology.org/L18-1602.pdf) | 
 67 | | ViTT | Multimodal Pretraining for Dense Video Captioning | [Paper](https://arxiv.org/pdf/2011.11760.pdf) | [Dataset](https://github.com/google-research-datasets/Video-Timeline-Tags-ViTT) | 
 68 | | MTD | MTD: A Multimodal Dataset of Musical Themes for MIR Research | [Paper](https://transactions.ismir.net/articles/10.5334/tismir.68/) | [Dataset](https://www.audiolabs-erlangen.de/resources/MIR/MTD) | 
 69 | | MusiClef | A professionally annotated and enriched multimodal data set on popular music| [Paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.302.2718&rep=rep1&type=pdf) | [Dataset](http://www.cp.jku.at/datasets/musiclef/index.html) | 
 70 | | Schubert Winterreise | Schubert Winterreise dataset: A multimodal scenario for music analysis | [Paper](https://dl.acm.org/doi/pdf/10.1145/3429743) | [Dataset](https://zenodo.org/record/3968389#.YcQrk2hBxPY) | 
 71 | | WIT | WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning | [Paper](https://arxiv.org/pdf/2103.01913.pdf) | [Dataset](https://github.com/google-research-datasets/wit) | 
 72 | 
 73 | 4. **Question Answering**
 74 | 
 75 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
 76 | | ----------------- | ---------------------- |---------------------- |------------------------ | 
 77 | | MQA | A Dataset for Multimodal Question Answering in the Cultural Heritage Domain | [Paper](https://aclanthology.org/W16-4003.pdf) | - | 
 78 | | MovieQA |Movieqa: Understanding stories in movies through question-answering	MovieQA	| [Paper](https://arxiv.org/pdf/1512.02902.pdf) |  [Dataset](https://github.com/makarandtapaswi/MovieQA_CVPR2016) |	
 79 | | PororoQA |Deep story video story qa by deep embedded memory networks|	[Paper](https://arxiv.org/ftp/arxiv/papers/1707/1707.00836.pdf) |	[Dataset](https://github.com/Kyung-Min/Deep-Embedded-Memory-Networks) | 
 80 | | MemexQA |MemexQA: Visual Memex Question Answering |	[Paper](https://arxiv.org/pdf/1708.01336.pdf) |	[Dataset](https://memexqa.cs.cmu.edu/) | 
 81 | | VQA |Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering |	[Paper](https://arxiv.org/pdf/1612.00837.pdf) |	[Dataset](https://visualqa.org/)| 
 82 | | TDIUC |An analysis of visual question answering algorithms	| [Paper](https://openaccess.thecvf.com/content_ICCV_2017/papers/Kafle_An_Analysis_of_ICCV_2017_paper.pdf)	| [Dataset](https://kushalkafle.com/projects/tdiuc.html) | 
 83 | | TGIF-QA |TGIF-QA: Toward spatio-temporal reasoning in visual question answering |	[Paper](https://arxiv.org/pdf/1704.04497.pdf) |	[Dataset](https://github.com/YunseokJANG/tgif-qa) | 
 84 | | MSVD QA, MSRVTT QA |Video question answering via attribute augmented attention network learning |	[Paper](https://arxiv.org/pdf/1707.06355.pdf) |	[Dataset](https://github.com/xudejing/video-question-answering) | 
 85 | | YouTube2Text |Video Question Answering via Gradually Refined Attention over Appearance and Motion |	[Paper](http://staff.ustc.edu.cn/~hexn/papers/mm17-videoQA.pdf)	| [Dataset](https://github.com/topics/youtube2text) | 
 86 | | MovieFIB |A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering |	[Paper](https://arxiv.org/pdf/1611.07810.pdf) |	[Dataset](https://github.com/teganmaharaj/MovieFIB/blob/master/README.md) | 
 87 | | Video Context QA |Uncovering the temporal context for video question answering |	[Paper](https://arxiv.org/pdf/1511.04670.pdf) |	[Dataset](https://github.com/ffmpbgrnn/VideoQA) | 
 88 | | MarioQA |Marioqa: Answering questions by watching gameplay videos |	[Paper](https://arxiv.org/pdf/1612.01669.pdf) |	[Dataset](https://github.com/JonghwanMun/MarioQA) | 
 89 | | TVQA |Tvqa: Localized, compositional video question answering |	[Paper](https://arxiv.org/pdf/1809.01696.pdf) |	[Dataset](https://tvqa.cs.unc.edu/) | 
 90 | | VQA-CP v2 |Don’t just assume; look and answer: Overcoming priors for visual question answering |	[Paper](https://arxiv.org/pdf/1712.00377.pdf) |	[Dataset](https://github.com/cdancette/vqa-cp-leaderboard) | 
 91 | | RecipeQA | RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes| [Paper](https://arxiv.org/pdf/1809.00812.pdf) | [Dataset](https://hucvl.github.io/recipeqa/) | 
 92 | | GQA |GQA: A new dataset for real-world visual reasoning and compositional question answering	| [Paper](https://arxiv.org/pdf/1902.09506v3.pdf) |	[Dataset](https://github.com/leaderj1001/Vision-Language) | 
 93 | | Social IQ |Social-iq: A question answering benchmark for artificial social intelligence |	[Paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zadeh_Social-IQ_A_Question_Answering_Benchmark_for_Artificial_Social_Intelligence_CVPR_2019_paper.pdf) |	[Dataset](https://github.com/A2Zadeh/CMU-MultimodalSDK) | 
 94 | | MIMOQA |MIMOQA: Multimodal Input Multimodal Output Question Answering |	[Paper](https://aclanthology.org/2021.naacl-mai) | - | 
 95 | 
 96 | 
 97 | 5. **Summarization**
 98 | 
 99 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
100 | | ----------------- | ---------------------- |---------------------- |------------------------ |
101 | | SumMe |Tvsum: Summarizing web videos using titles|[Paper](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Song_TVSum_Summarizing_Web_2015_CVPR_paper.pdf) | [Dataset](https://github.com/yalesong/tvsum) | 
102 | | TVSum |Creating summaries from user videos|[Paper](https://gyglim.github.io/me/papers/GygliECCV14_vsum.pdf) | [Dataset](https://gyglim.github.io/me/vsum/index.html)| 
103 | | QFVS |Query-focused video summarization: Dataset, evaluation, and a memory network based approach |	[Paper](https://arxiv.org/abs/1707.04960) |	[Dataset](https://www.aidean-sharghi.com/cvpr2017) | 
104 | | MMSS |Multi-modal Sentence Summarization with Modality Attention and Image Filtering | [Paper](https://www.ijcai.org/proceedings/2018/0577.pdf) | - | 
105 | | MSMO |MSMO: Multimodal Summarization with Multimodal Output|[Paper](https://aclanthology.org/D18-1448.pdf) | - | 
106 | | Screen2Words |Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning |	[Paper](https://arxiv.org/pdf/2108.03353.pdf)	| [Dataset](https://github.com/google-research-datasets/screen2words) | 
107 | | AVIATE | IEMOCAP: interactive emotional dyadic motion capture database |[Paper](https://link.springer.com/content/pdf/10.1007/s10579-008-9076-6.pdf) | [Dataset](https://sail.usc.edu/software/databases/) | 
108 | | Multimodal Microblog Summarizaion |On Multimodal Microblog Summarization|[Paper](https://ieeexplore.ieee.org/document/9585070) | - | 
109 | 
110 | 
111 | 6. **Human Computer Interaction**
112 | 
113 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
114 | | ----------------- | ---------------------- |---------------------- |------------------------ |
115 | | CAUVE |CUAVE: A new audio-visual database for multimodal human-computer interface research	| [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5745028)	|[Dataset](http://people.csail.mit.edu/siracusa/avdata/)| 
116 | | MHAD |Berkeley mhad: A comprehensive multimodal human action database | [Paper](https://ieeexplore.ieee.org/document/6474999)	|[Dataset](https://tele-immersion.citris-uc.org/berkeley_mhad)| 
117 | | Multi-party interactions |A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction	| [Paper](https://aclanthology.org/L16-1703.pdf) | - | 
118 | | MHHRI |Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement	| [Paper](https://ieeexplore.ieee.org/document/8003432)|[Dataset](	https://www.cl.cam.ac.uk/research/rainbow/projects/mhhri/) | 
119 | | Red Hen Lab |Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research	| [Paper](https://link.springer.com/content/pdf/10.1007/s13218-017-0505-9.pdf) | - | 
120 | | EMRE | Generating a Novel Dataset of Multimodal Referring Expressions |[Paper](https://aclanthology.org/W19-0507.pdf) | [Dataset](https://github.com/VoxML/public-data/tree/master/EMRE/HIT) | 
121 | | Chinese Whispers |Chinese whispers: A multimodal dataset for embodied language grounding	| [Paper](https://www.researchgate.net/publication/341294259_Chinese_Whispers_A_Multimodal_Dataset_for_Embodied_Language_Grounding)	|[Dataset](https://zenodo.org/record/4587308#.YbJEctBBxPZ) | 
122 | | uulmMAC |The uulmMAC database—A multimodal affective corpus for affective computing in human-computer interaction	| [Paper](https://www.mdpi.com/1424-8220/20/8/2308)	|[Dataset](https://neuro.informatik.uni-ulm.de/TC9/tools-and-data-sets/uulmmac-database/) | 
123 | 
124 | 7. **Semantic Analysis**
125 | 
126 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
127 | | ----------------- | ---------------------- |---------------------- |------------------------ |
128 | | WN9-IMG |Image-embodied Knowledge Representation Learning	| [Paper](https://www.ijcai.org/proceedings/2017/0438.pdf)	|[Dataset](https://github.com/xrb92/IKRL) | 
129 | | Wikimedia Commons |A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions	| [Paper](https://aclanthology.org/W18-1814.pdf)	|[Dataset](https://commons.wikimedia.org/wiki/Main_Page) | 
130 | | Starsem18-multimodalKB |A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning	| [Paper](https://aclanthology.org/S18-2027.pdf)	|[Dataset](	https://github.com/UKPLab/starsem18-multimodalKB) | 
131 | | MUStARD |Towards Multimodal Sarcasm Detection	| [Paper](https://arxiv.org/pdf/1906.01815.pdf)	|[Dataset](https://github.com/soujanyaporia/MUStARD) | 
132 | | YouMakeup |YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension	| [Paper](https://aclanthology.org/D19-1517.pdf)	|[Dataset](	https://github.com/AIM3-RUC/YouMakeup) | 
133 | | MDID |Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts	| [Paper](https://arxiv.org/pdf/1904.09073.pdf)	|[Dataset](	https://github.com/karansikka1/documentIntent_emnlp19) | 
134 | | Social media posts from Flickr (Mental Health) |Inferring Social Media Users’ Mental Health Status from Multimodal Information	| [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147779/)	|[Dataset](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147779/) | 
135 | | Twitter MEL |Building a Multimodal Entity Linking Dataset From Tweets	Building a Multimodal Entity Linking Dataset From Tweets | [Paper](aclanthology.org)	|[Dataset](	https://github.com/OA256864/MEL_Tweets) | 
136 | | MultiMET | MultiMET: A Multimodal Dataset for Metaphor Understanding |[Paper](https://aclanthology.org/2021.acl-long.249.pdf) | - | 
137 | | MSDS |Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline	|[Paper](https://arxiv.org/pdf/2105.05542.pdf)	|[Dataset](https://zenodo.org/record/4701383) | 
138 | 
139 | 8. **Miscellaneous**
140 | 
141 | | **Dataset**       | **Title of the Paper** | **Link of the Paper** | **Link of the Dataset** |
142 | | ----------------- | ---------------------- |---------------------- |------------------------ |
143 | | MS COCO |Microsoft COCO: Common objects in context	|[Paper](https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48) | [Dataset](https://github.com/topics/mscoco-dataset) | 
144 | | ILSVRC |ImageNet Large Scale Visual Recognition Challenge	|[Paper](https://arxiv.org/pdf/1409.0575.pdf) | [Dataset](https://image-net.org/download.php) | 
145 | | YFCC100M |YFCC100M: The new data in multimedia research	|[Paper](https://arxiv.org/pdf/1503.01817.pdf) | [Dataset](https://github.com/chi0tzp/YFCC100M-Downloader) | 
146 | | COGNIMUSE | COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization |[Paper](https://jivp-eurasipjournals.springeropen.com/track/pdf/10.1186/s13640-017-0194-1.pdf) | [Dataset](https://cognimuse.cs.ntua.gr/research_datasets) | 
147 | | SNAG |SNAG: Spoken Narratives and Gaze Dataset	|[Paper](https://aclanthology.org/P18-2022.pdf) | [Dataset](https://mvrl-clasp.github.io/SNAG/) | 
148 | | UR-Funny |UR-FUNNY: A Multimodal Language Dataset for Understanding Humor	|[Paper](https://arxiv.org/pdf/1904.06618.pdf) | [Dataset](https://github.com/ROC-HCI/UR-FUNNY/blob/master/UR-FUNNY-V1.md)| 
149 | | Bag-of-Lies |Bag-of-Lies: A Multimodal Dataset for Deception Detection	|[Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9025340) | [Dataset](http://iab-rubric.org/resources/BagLies.html) | 
150 | | MARC |A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks	|[Paper](https://aclanthology.org/2020.acl-main.440.pdf)	| [Dataset](https://github.com/microsoft/multimodal-aligned-recipe-corpus)| 
151 | | MuSE |MuSE: a Multimodal Dataset of Stressed Emotion	|[Paper](https://aclanthology.org/2020.lrec-1.187.pdf) | [Dataset](http://lit.eecs.umich.edu/downloads.html) | 
152 | | BabelPic |Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concept	|[Paper](https://aclanthology.org/2020.acl-main.425.pdf) | [Dataset](https://sapienzanlp.github.io/babelpic/) | 
153 | | Eye4Ref |Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations	|[Paper](https://aclanthology.org/2020.lrec-1.292.pdf) | - | 
154 | | Troll Memes | A Dataset for Troll Classification of TamilMemes|[Paper](https://aclanthology.org/2020.wildre-1.2.pdf) | [Dataset](https://github.com/sharduls007/TamilMemes) | 
155 | | SEMD |EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system	|[Paper](https://www.computer.org/csdl/journal/ta/5555/01/09165162/1mcQTrYsXbG) | - | 
156 | | Chat talk Corpus |Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness	|[Paper](https://aclanthology.org/2020.lrec-1.56.pdf) | - | 
157 | | EMOTyDA |Towards Emotion-aided Multi-modal Dialogue Act Classification	|[Paper](https://aclanthology.org/2020.acl-main.402.pdf) | [Dataset](https://github.com/sahatulika15/EMOTyDA) | 
158 | | MELINDA |MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification	|[Paper](https://arxiv.org/pdf/2012.09216.pdf) | [Dataset](https://github.com/PlusLabNLP/melinda)| 
159 | | NewsCLIPpings |NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media|[Paper](https://aclanthology.org/2021.emnlp-main.545.pdf) | [Dataset](https://github.com/g-luo/news_clippings) | 
160 | | R2VQ | Designing Multimodal Datasets for NLP Challenges |[Paper](https://arxiv.org/pdf/2105.05999.pdf) | [Dataset](https://competitions.codalab.org/competitions/34056) | 
161 | | M2H2 | M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations |[Paper](https://arxiv.org/pdf/2108.01260.pdf) | [Dataset](https://github.com/declare-lab/M2H2-dataset) | 
162 | 


--------------------------------------------------------------------------------