├── Figures ├── Figure1.pdf ├── Overview.png ├── Sources.png └── figure1.png └── README.md /Figures/Figure1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beccabai/Data-centric_multimodal_LLM/2a6e61ff51b17bf2a0256e5799579add609ccd6d/Figures/Figure1.pdf -------------------------------------------------------------------------------- /Figures/Overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beccabai/Data-centric_multimodal_LLM/2a6e61ff51b17bf2a0256e5799579add609ccd6d/Figures/Overview.png -------------------------------------------------------------------------------- /Figures/Sources.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beccabai/Data-centric_multimodal_LLM/2a6e61ff51b17bf2a0256e5799579add609ccd6d/Figures/Sources.png -------------------------------------------------------------------------------- /Figures/figure1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beccabai/Data-centric_multimodal_LLM/2a6e61ff51b17bf2a0256e5799579add609ccd6d/Figures/figure1.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Data-centric Multimodal LLM 2 | 3 | Survey on data-centric multimodal large language models 4 | 5 | [Paper](https://arxiv.org/abs/2405.16640) 6 | 7 | ![Sources](Figures/figure1.png) 8 | 9 | List of Sources 10 | 11 | | Source Name | Source Link | Type | 12 | | -------------------------------------------- | ------------------------------------------------------------ | --------------- | 13 | | CommonCrawl | https://commoncrawl.org/ | Common Webpages | 14 | | Flickr | https://www.flickr.com/ | Common Webpages | 15 | | Flickr Video | https://www.flickr.com/photos/tags/vídeo/ | Common Webpages | 16 | | FreeSound | [https://freesound.org](https://freesound.org/) | Common Webpages | 17 | | BBC Sound Effects4 | [https://sound-effects.bbcrewind.co.uk](https://sound-effects.bbcrewind.co.uk/) | Common Webpages | 18 | | SoundBible | https://soundbible.com/ | Common Webpages | 19 | | Wikipedia | https://www.wikipedia.org/ | Wikipedia | 20 | | Wikimedia Commons | https://commons.wikimedia.org/ | Wikipedia | 21 | | Stack Exchange | https://stackexchange.com/ | Social Media | 22 | | Reddit | https://www.reddit.com/ | Social Media | 23 | | Ubuntu IRC | https://ubuntu.com/ | Social Media | 24 | | Youtube | [https://www.youtube.com](https://www.youtube.com/) | Social Media | 25 | | X | [https://x.com](https://x.com/) | Social Media | 26 | | S2ORC | https://github.com/allenai/s2orc | Academic Papers | 27 | | Arxiv | https://arxiv.org/ | Academic Papers | 28 | | Project Gutenberg | [https://www.gutenberg.org](https://www.gutenberg.org/) | Books | 29 | | Smashwords | https://www.smashwords.com/ | Books | 30 | | Bibliotik | https://bibliotik.me/ | Books | 31 | | National Diet Library | https://dl.ndl.go.jp/ja/photo | Books | 32 | | BigQuery public dataset | https://cloud.google.com/bigquery/public-data | Code | 33 | | GitHub | https://github.com/ | Code | 34 | | FreeLaw | https://www.freelaw.in/ | Legal | 35 | | Chinese legal documents | https://www.spp.gov.cn/spp/fl/ | Legal | 36 | | Khan Academy exercises | [https://www.khanacademy.org](https://www.khanacademy.org/) | Maths | 37 | | MEDLINE | [www.medline.com](http://www.medline.com/) | Medical | 38 | | Patient | [https://patient.info](https://patient.info/) | Medical | 39 | | WebMD | [https://www.webmd.com](https://www.webmd.com/) | Medical | 40 | | NIH | https://www.nih.gov/ | Medical | 41 | | 39 Ask Doctor | https://ask.39.net/ | Medical | 42 | | Medical Exams | https://drive.google.com/file/d/1ImYUSLk9JbgHXOemfvyiDiirluZHPeQw/view | Medical | 43 | | Baidu Doctor | https://muzhi.baidu.com/ | Medical | 44 | | 120 Asks | https://www.120ask.com/ | Medical | 45 | | BMJ Case Reports | [https://casereports.bmj.com](https://casereports.bmj.com/) | Medical | 46 | | XYWY | [http://www.xywy.com](http://www.xywy.com/) | Medical | 47 | | Qianwen Health | [https://51zyzy.com](https://51zyzy.com/) | Medical | 48 | | PubMed | [https://pubmed.ncbi.nlm.nih.gov](https://pubmed.ncbi.nlm.nih.gov/) | Medical | 49 | | EDGAR | https://www.sec.gov/edgar | Financial | 50 | | SEC Financial Statement and Notes Data Sets | https://www.sec.gov/dera/data/financial-statement-and-notes-data-set | Financial | 51 | | Sina Finance | https://finance.sina.com.cn/ | Financial | 52 | | Tencent Finance | https://new.qq.com/ch/finance/ | Financial | 53 | | Eastmoney | https://www.eastmoney.com/ | Financial | 54 | | Guba | https://guba.eastmoney.com/ | Financial | 55 | | Xueqiu | https://xueqiu.com/ | Financial | 56 | | Phoenix Finance | https://finance.ifeng.com/ | Financial | 57 | | 36Kr | https://36kr.com/ | Financial | 58 | | Huxiu | https://www.huxiu.com/ | Financial | 59 | 60 | ## Commonly-used datasets 61 | 62 | Textual-Pretraining Datasets: 63 | 64 | | Datasets | Link | 65 | | ------------------------- | ------------------------------------------------------------ | 66 | | RedPajama-Data-1T | https://www.together.ai/blog/redpajama | 67 | | RedPajama-Data-v2 | https://www.together.ai/blog/redpajama-data-v2 | 68 | | SlimPajama | https://huggingface.co/datasets/cerebras/SlimPajama-627B | 69 | | Falcon-RefinedWeb | https://huggingface.co/datasets/tiiuae/falcon-refinedweb | 70 | | Pile | https://github.com/EleutherAI/the-pile?tab=readme-ov-file | 71 | | ROOTS | https://huggingface.co/bigscience-data | 72 | | WuDaoCorpora | https://data.baai.ac.cn/details/WuDaoCorporaText | 73 | | Common Crawl | https://commoncrawl.org/ | 74 | | C4 | https://huggingface.co/datasets/c4 | 75 | | mC4 | https://arxiv.org/pdf/2010.11934.pdf | 76 | | Dolma Dataset | https://github.com/allenai/dolma | 77 | | OSCAR-22.01 | https://oscar-project.github.io/documentation/versions/oscar-2201/ | 78 | | OSCAR-23.01 | https://huggingface.co/datasets/oscar-corpus/OSCAR-2301 | 79 | | colossal-oscar-1.0 | https://huggingface.co/datasets/oscar-corpus/colossal-oscar-1.0 | 80 | | Wiki40b | https://www.tensorflow.org/datasets/catalog/wiki40b | 81 | | Pushshift Reddit Dataset | https://paperswithcode.com/dataset/pushshift-reddit | 82 | | OpenWebTextCorpus | https://paperswithcode.com/dataset/openwebtext | 83 | | OpenWebText2 | https://openwebtext2.readthedocs.io/en/latest/ | 84 | | BookCorpus | https://huggingface.co/datasets/bookcorpus | 85 | | Gutenberg | https://shibamoulilahiri.github.io/gutenberg_dataset.html | 86 | | CC-Stories-R | https://paperswithcode.com/dataset/cc-stories | 87 | | CC-NEWES | https://huggingface.co/datasets/cc_news | 88 | | REALNEWS | https://paperswithcode.com/dataset/realnews | 89 | | Reddit submission dataset | https://www.philippsinger.info/reddit/ | 90 | | General Reddit Dataset | https://www.tensorflow.org/datasets/catalog/reddit | 91 | | AMPS | https://drive.google.com/file/d/1hQsua3TkpEmcJD_UWQx8dmNdEZPyxw23/view | 92 | 93 | MM-Pretraining Datasets: 94 | 95 | | Dataset Name | Paper Title (with hyperlink) | Modality | 96 | | ---------------- | ------------------------------------------------------------ | ------------------ | 97 | | ALIGN | [Scaling up visual and vision-language representation learning with noisy text supervision](https://huggingface.co/docs/transformers/model_doc/align) | Image | 98 | | LTIP | [Flamingo: a visual language model for few-shot learning](https://github.com/lucidrains/flamingo-pytorch) | Image | 99 | | MS-COCO | [Microsoft coco: Common objects in context](https://cocodataset.org/#overview) | Image | 100 | | Visual Genome | [Visual genome: Connecting language and vision using crowdsourced dense image annotations](https://link.springer.com/article/10.1007/S11263-016-0981-7) | Image | 101 | | CC3M | [Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning](https://aclanthology.org/P18-1238/) | Image | 102 | | CC12M | [Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts](https://openaccess.thecvf.com/content/CVPR2021/html/Changpinyo_Conceptual_12M_Pushing_Web-Scale_Image-Text_Pre-Training_To_Recognize_Long-Tail_Visual_CVPR_2021_paper.html) | Graph | 103 | | SBU | [Im2text: Describing images using 1 million captioned photographs](https://proceedings.neurips.cc/paper_files/paper/2011/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html) | Image | 104 | | LAION-5B | [Laion-5b: An open large-scale dataset for training next generation image-text models](https://laion.ai/blog/laion-5b/) | Image | 105 | | LAION-400M | [Laion-400m: Open dataset of clip-filtered 400 million image-text pairs](https://arxiv.org/abs/2111.02114) | Image | 106 | | LAION-COCO | [Laion-coco: In the style of MS COCO](https://laion.ai/blog/laion-coco/) | Image | 107 | | Flickr30k | [From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00166/43313/From-image-descriptions-to-visual-denotations-New) | Image | 108 | | AI Challenger | [Ai challenger: A large-scale dataset for going deeper in image understanding](https://arxiv.org/abs/1711.06475) | Image | 109 | | COYO | [COYO-700M: Image-Text Pair Dataset](https://github.com/kakaobrain/coyo-dataset) | Image | 110 | | Wukong | [Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark](https://proceedings.neurips.cc/paper_files/paper/2022/hash/a90b9a09a6ee43d6631cf42e225d73b4-Abstract-Datasets_and_Benchmarks.html) | Image | 111 | | COCO Caption | [Microsoft coco captions: Data collection and evaluation server](https://arxiv.org/abs/1504.00325) | Image | 112 | | WebLI | [Pali: A jointly-scaled multilingual language-image model](https://arxiv.org/abs/2209.06794) | Image | 113 | | Episodic WebLI | [Pali-x: On scaling up a multilingual vision and language model](https://arxiv.org/abs/2305.18565) | Image | 114 | | CC595k | [Visual instruction tuning](https://proceedings.neurips.cc/paper_files/paper/2023/hash/6dcf277ea32ce3288914faf369fe6de0-Abstract-Conference.html) | Image | 115 | | ReferItGame | [Referitgame: Referring to objects in photographs of natural scenes](https://aclanthology.org/D14-1086/) | Image | 116 | | RefCOCO&RefCOCO+ | [Modeling context in referring expressions](https://link.springer.com/chapter/10.1007/978-3-319-46475-6_5) | Image | 117 | | Visual-7W | [Visual7w: Grounded question answering in images](https://openaccess.thecvf.com/content_cvpr_2016/html/Zhu_Visual7W_Grounded_Question_CVPR_2016_paper.html) | Image | 118 | | OCR-VQA | [Ocr-vqa: Visual question answering by reading text in images](https://ieeexplore.ieee.org/abstract/document/8978122) | Image | 119 | | ST-VQA | [Scene text visual question answering](https://openaccess.thecvf.com/content_ICCV_2019/html/Biten_Scene_Text_Visual_Question_Answering_ICCV_2019_paper.html) | Image | 120 | | DocVQA | [Docvqa: A dataset for vqa on document images](https://openaccess.thecvf.com/content/WACV2021/html/Mathew_DocVQA_A_Dataset_for_VQA_on_Document_Images_WACV_2021_paper.html) | Image | 121 | | TextVQA | [Towards vqa models that can read](https://openaccess.thecvf.com/content_CVPR_2019/html/Singh_Towards_VQA_Models_That_Can_Read_CVPR_2019_paper.html) | Image | 122 | | DataComp | [Datacomp: In search of the next generation of multimodal datasets](https://proceedings.neurips.cc/paper_files/paper/2023/hash/56332d41d55ad7ad8024aac625881be7-Abstract-Datasets_and_Benchmarks.html) | Image | 123 | | GQA | [Gqa: A new dataset for real-world visual reasoning and compositional question answering](https://openaccess.thecvf.com/content_CVPR_2019/html/Hudson_GQA_A_New_Dataset_for_Real-World_Visual_Reasoning_and_Compositional_CVPR_2019_paper.html) | Image | 124 | | VQA | [VQA: Visual Question Answering](https://openaccess.thecvf.com/content_iccv_2015/html/Antol_VQA_Visual_Question_ICCV_2015_paper.html) | Image | 125 | | VQAv2 | [Making the v in vqa matter: Elevating the role of image understanding in visual question answering](https://visualqa.org/) | Image | 126 | | DVQA | [Dvqa: Understanding data visualizations via question answering](http://openaccess.thecvf.com/content_cvpr_2018/html/Kafle_DVQA_Understanding_Data_CVPR_2018_paper.html) | Image | 127 | | A-OK-VQA | [A-okvqa: A benchmark for visual question answering using world knowledge](https://link.springer.com/chapter/10.1007/978-3-031-20074-8_9) | Image | 128 | | Text Captions | [Textcaps: a dataset for image captioning with reading comprehension](https://link.springer.com/chapter/10.1007/978-3-030-58536-5_44) | Image | 129 | | M3W | [Flamingo: a visual language model for few-shot learning](https://proceedings.neurips.cc/paper_files/paper/2022/hash/960a172bc7fbf0177ccccbb411a7d800-Abstract-Conference.html) | Image | 130 | | MMC4 | [Multimodal c4: An open, billion-scale corpus of images interleaved with text](https://proceedings.neurips.cc/paper_files/paper/2023/hash/1c6bed78d3813886d3d72595dbecb80b-Abstract-Datasets_and_Benchmarks.html) | Image | 131 | | MSRVTT | [Msr-vtt: A large video description dataset for bridging video and language](https://ieeexplore.ieee.org/document/7780940/) | Video | 132 | | WebVid-2M | [Frozen in time: A joint video and image encoder for end-to-end retrieval](https://openaccess.thecvf.com/content/ICCV2021/html/Bain_Frozen_in_Time_A_Joint_Video_and_Image_Encoder_for_ICCV_2021_paper.html) | Video | 133 | | VTP | [Flamingo: a visual language model for few-shot learning](https://github.com/lucidrains/flamingo-pytorch) | Video | 134 | | AISHELL-1 | [Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline](https://ieeexplore.ieee.org/abstract/document/8384449) | Audio | 135 | | AISHELL-2 | [Aishell-2: Transforming mandarin asr research into industrial scale](https://www.aishelltech.com/aishell_2) | Audio | 136 | | WaveCaps | [Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research](https://github.com/XinhaoMei/WavCaps) | Audio | 137 | | VisDial | [Visual dialog](https://openaccess.thecvf.com/content_cvpr_2017/html/Das_Visual_Dialog_CVPR_2017_paper.html) | Image | 138 | | VSDial-CN | [X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages](https://github.com/phellonchen/X-LLM/blob/main/README_DATA.md) | Image, Audio | 139 | | MELON | [Audio Retrieval for Multimodal Design Documents: A New Dataset and Algorithms](https://arxiv.org/abs/2302.14757) | Image, Text, Audio | 140 | 141 | Common Textual SFT Datasets: 142 | 143 | | Dataset Name | Language | Construction Method | Github Link | Paper Link | Dataset Link | 144 | | ---------------------------- | ----------- | -------------------- | ------------------------------------------------------------ | --------------------------------------------------------- | ------------------------------------------------------------ | 145 | | databricks-dolly-15K | EN | HG | | | https://huggingface.co/datasets/databricks/databricks-dolly-15k | 146 | | InstructionWild_v2 | EN & ZH | HG | https://github.com/XueFuzhao/InstructionWild | | | 147 | | LCCC | ZH | HG | https://github.com/thu-coai/CDial-GPT | https://arxiv.org/pdf/2008.03946.pdf | | 148 | | OASST1 | Multi (35) | HG | https://github.com/imoneoi/openchat | https://arxiv.org/pdf/2309.11235.pdf | https://huggingface.co/openchat | 149 | | OL-CC | ZH | HG | | | https://data.baai.ac.cn/details/OL-CC | 150 | | Zhihu-KOL | ZH | HG | https://github.com/wangrui6/Zhihu-KOL | | https://huggingface.co/datasets/wangrui6/Zhihu-KOL | 151 | | Aya Dataset | Multi (65) | HG | | https://arxiv.org/abs/2402.06619 | https://hf.co/datasets/CohereForAI/aya_dataset | 152 | | InstructIE | EN & ZH | HG | https://github.com/zjunlp/KnowLM | https://arxiv.org/abs/2305.11527 | https://huggingface.co/datasets/zjunlp/InstructIE | 153 | | Alpaca_data | EN | MC | https://github.com/tatsu-lab/stanford_alpaca#data-release | | | 154 | | BELLE_Generated_Chat | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/10M | | https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M | 155 | | BELLE_Multiturn_Chat | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/10M | | https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M | 156 | | BELLE_train_0.5M_CN | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/1.5M | | https://huggingface.co/datasets/BelleGroup/train_0.5M_CN | 157 | | BELLE_train_1M_CN | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/1.5M | | https://huggingface.co/datasets/BelleGroup/train_1M_CN | 158 | | BELLE_train_2M_CN | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/10M | | https://huggingface.co/datasets/BelleGroup/train_2M_CN | 159 | | BELLE_train_3.5M_CN | ZH | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/10M | | https://huggingface.co/datasets/BelleGroup/train_3.5M_CN | 160 | | CAMEL | Multi & PL | MC | https://github.com/camel-ai/camel | https://arxiv.org/pdf/2303.17760.pdf | https://huggingface.co/camel-ai | 161 | | Chatgpt_corpus | ZH | MC | https://github.com/PlexPt/chatgpt-corpus/releases/tag/3 | | | 162 | | InstructionWild_v1 | EN & ZH | MC | https://github.com/XueFuzhao/InstructionWild | | | 163 | | LMSYS-Chat-1M | Multi | MC | | https://arxiv.org/pdf/2309.11998.pdf | https://huggingface.co/datasets/lmsys/lmsys-chat-1m | 164 | | MOSS_002_sft_data | EN & ZH | MC | https://github.com/OpenLMLab/MOSS | | https://huggingface.co/datasets/fnlp/moss-002-sft-data | 165 | | MOSS_003_sft_data | EN & ZH | MC | https://github.com/OpenLMLab/MOSS | | | 166 | | MOSS_003_sft_plugin_data | EN & ZH | MC | https://github.com/OpenLMLab/MOSS | | | 167 | | OpenChat | EN | MC | https://github.com/imoneoi/openchat | https://arxiv.org/pdf/2309.11235.pdf | https://huggingface.co/openchat | 168 | | RedGPT-Dataset-V1-CN | ZH | MC | https://github.com/DA-southampton/RedGPT | | | 169 | | Self-Instruct | EN | MC | https://github.com/yizhongw/self-instruct | https://aclanthology.org/2023.acl-long.754.pdf | | 170 | | ShareChat | Multi | MC | | | | 171 | | ShareGPT-Chinese-English-90k | EN & ZH | MC | https://github.com/CrazyBoyM/llama2-Chinese-chat | | https://huggingface.co/datasets/shareAI/ShareGPT-Chinese-English-90k | 172 | | ShareGPT90K | EN | MC | | | https://huggingface.co/datasets/RyokoAI/ShareGPT52K | 173 | | UltraChat | EN | MC | https://github.com/thunlp/UltraChat#UltraLM | https://arxiv.org/pdf/2305.14233.pdf | | 174 | | Unnatural | EN | MC | https://github.com/orhonovich/unnatural-instructions | https://aclanthology.org/2023.acl-long.806.pdf | | 175 | | WebGLM-QA | EN | MC | https://github.com/THUDM/WebGLM | https://arxiv.org/pdf/2306.07906.pdf | https://huggingface.co/datasets/THUDM/webglm-qa | 176 | | Wizard_evol_instruct_196K | EN | MC | https://github.com/nlpxucan/WizardLM | https://arxiv.org/pdf/2304.12244.pdf | https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k | 177 | | Wizard_evol_instruct_70K | EN | MC | https://github.com/nlpxucan/WizardLM | https://arxiv.org/pdf/2304.12244.pdf | https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k | 178 | | CrossFit | EN | CI | https://github.com/INK-USC/CrossFit | https://arxiv.org/pdf/2104.08835.pdf | | 179 | | DialogStudio | EN | CI | https://github.com/salesforce/DialogStudio | https://arxiv.org/pdf/2307.10172.pdf | https://huggingface.co/datasets/Salesforce/dialogstudio | 180 | | Dynosaur | EN | CI | https://github.com/WadeYin9712/Dynosaur | https://arxiv.org/pdf/2305.14327.pdf | https://huggingface.co/datasets?search=dynosaur | 181 | | Flan-mini | EN | CI | https://github.com/declare-lab/flacuna | https://arxiv.org/pdf/2307.02053.pdf | https://huggingface.co/datasets/declare-lab/flan-mini | 182 | | Flan | Multi | CI | https://github.com/google-research/flan | https://arxiv.org/pdf/2109.01652.pdf | | 183 | | Flan | Multi | CI | https://github.com/google-research/FLAN/tree/main/flan/v2 | https://arxiv.org/pdf/2301.13688.pdf | https://huggingface.co/datasets/SirNeural/flan_v2 | 184 | | InstructDial | EN | CI | https://github.com/prakharguptaz/Instructdial | https://arxiv.org/pdf/2205.12673.pdf | | 185 | | NATURAL INSTRUCTIONS | EN | CI | https://github.com/allenai/natural-instructions | https://aclanthology.org/2022.acl-long.244.pdf | https://instructions.apps.allenai.org/ | 186 | | OIG | EN | CI | | | https://huggingface.co/datasets/laion/OIG | 187 | | Open-Platypus | EN | CI | https://github.com/arielnlee/Platypus | https://arxiv.org/pdf/2308.07317.pdf | https://huggingface.co/datasets/garage-bAInd/Open-Platypus | 188 | | OPT-IML | Multi | CI | https://github.com/facebookresearch/metaseq | https://arxiv.org/pdf/2212.12017.pdf | | 189 | | PromptSource | EN | CI | https://github.com/bigscience-workshop/promptsource | https://aclanthology.org/2022.acl-demo.9.pdf | | 190 | | SUPER-NATURAL INSTRUCTIONS | Multi | CI | https://github.com/allenai/natural-instructions | https://arxiv.org/pdf/2204.07705.pdf | | 191 | | T0 | EN | CI | | https://arxiv.org/pdf/2110.08207.pdf | | 192 | | UnifiedSKG | EN | CI | https://github.com/xlang-ai/UnifiedSKG | https://arxiv.org/pdf/2201.05966.pdf | | 193 | | xP3 | Multi (46) | CI | https://github.com/bigscience-workshop/xmtf | https://aclanthology.org/2023.acl-long.891.pdf | | 194 | | IEPile | EN & ZH | CI | https://github.com/zjunlp/IEPile | https://arxiv.org/abs/2402.14710 | https://huggingface.co/datasets/zjunlp/iepile | 195 | | Firefly | ZH | HG & CI | https://github.com/yangjianxin1/Firefly | | https://huggingface.co/datasets/YeungNLP/firefly-train-1.1M | 196 | | LIMA-sft | EN | HG & CI | | https://arxiv.org/pdf/2305.11206.pdf | https://huggingface.co/datasets/GAIR/lima | 197 | | COIG-CQIA | ZH | HG & CI | | https://arxiv.org/abs/2403.18058 | https://huggingface.co/datasets/m-a-p/COIG-CQIA | 198 | | InstructGPT-sft | EN | HG & MC | | https://arxiv.org/pdf/2203.02155.pdf | | 199 | | Alpaca_GPT4_data | EN | CI & MC | https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#data-release | https://arxiv.org/pdf/2304.03277.pdf | | 200 | | Alpaca_GPT4_data_zh | ZH | CI & MC | https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#data-release | | https://huggingface.co/datasets/shibing624/alpaca-zh | 201 | | Bactrain-X | Multi (52) | CI & MC | https://github.com/mbzuai-nlp/bactrian-x | https://arxiv.org/pdf/2305.15011.pdf | https://huggingface.co/datasets/MBZUAI/Bactrian-X | 202 | | Baize | EN | CI & MC | https://github.com/project-baize/baize-chatbot | https://arxiv.org/pdf/2304.01196.pdf | https://github.com/project-baize/baize-chatbot/tree/main/data | 203 | | GPT4All | EN | CI & MC | https://github.com/nomic-ai/gpt4all | https://gpt4all.io/reports/GPT4All_Technical_Report_3.pdf | https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/GPT4all | 204 | | GuanacoDataset | Multi | CI & MC | | | https://huggingface.co/datasets/JosephusCheung/GuanacoDataset | 205 | | LaMini-LM | EN | CI & MC | https://github.com/mbzuai-nlp/LaMini-LM | https://arxiv.org/pdf/2304.14402.pdf | https://huggingface.co/datasets/MBZUAI/LaMini-instruction | 206 | | LogiCoT | EN & ZH | CI & MC | https://github.com/csitfun/logicot | https://arxiv.org/pdf/2305.12147.pdf | https://huggingface.co/datasets/csitfun/LogiCoT | 207 | | LongForm | EN | CI & MC | https://github.com/akoksal/LongForm | https://arxiv.org/pdf/2304.08460.pdf | https://huggingface.co/datasets/akoksal/LongForm | 208 | | Luotuo-QA-B | EN & ZH | CI & MC | https://github.com/LC1332/Luotuo-QA | | https://huggingface.co/datasets/Logic123456789/Luotuo-QA-B | 209 | | OpenOrca | Multi | CI & MC | | https://arxiv.org/pdf/2306.02707.pdf | https://huggingface.co/datasets/Open-Orca/OpenOrca | 210 | | Wizard_evol_instruct_zh | ZH | CI & MC | https://github.com/LC1332/Chinese-alpaca-lora | | https://huggingface.co/datasets/silk-road/Wizard-LM-Chinese-instruct-evol | 211 | | COIG | ZH | HG & CI & MC | https://github.com/FlagOpen/FlagInstruct | https://arxiv.org/pdf/2304.07987.pdf | https://huggingface.co/datasets/BAAI/COIG | 212 | | HC3 | EN & ZH | HG & CI & MC | https://github.com/Hello-SimpleAI/chatgpt-comparison-detection | https://arxiv.org/pdf/2301.07597.pdf | | 213 | | Phoenix-sft-data-v1 | Multi | HG & CI & MC | https://github.com/FreedomIntelligence/LLMZoo | https://arxiv.org/pdf/2304.10453.pdf | https://huggingface.co/datasets/FreedomIntelligence/phoenix-sft-data-v1 | 214 | | TigerBot_sft_en | EN | HG & CI & MC | https://github.com/TigerResearch/TigerBot | https://arxiv.org/abs/2312.08688 | https://huggingface.co/datasets/TigerResearch/sft_en | 215 | | TigerBot_sft_zh | ZH | HG & CI & MC | https://github.com/TigerResearch/TigerBot | https://arxiv.org/abs/2312.08688 | https://huggingface.co/datasets/TigerResearch/sft_zh | 216 | | Aya Collection | Multi (114) | HG & CI & MC | https://arxiv.org/abs/2402.06619 | | https://hf.co/datasets/CohereForAI/aya_collection | 217 | 218 | Domain Specific Textual SFT Datasets: 219 | 220 | | Dataset Name | Language | Domain | Construction Method | Github Link | Paper Link | Dataset Link | 221 | | ------------------------ | ------------- | -------------- | ------------------- | ------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------ | 222 | | ChatDoctor | EN | Medical | HG & MC | https://github.com/Kent0n-Li/ChatDoctor | https://arxiv.org/ftp/arxiv/papers/2303/2303.14070.pdf | | 223 | | ChatMed_Consult_Dataset | ZH | Medical | MC | https://github.com/michael-wzhu/ChatMed | | https://huggingface.co/datasets/michaelwzhu/ChatMed_Consult_Dataset | 224 | | CMtMedQA | ZH | Medical | HG | https://github.com/SupritYoung/Zhongjing | https://arxiv.org/pdf/2308.03549.pdf | https://huggingface.co/datasets/Suprit/CMtMedQA | 225 | | DISC-Med-SFT | ZH | Medical | HG & CI | https://github.com/FudanDISC/DISC-MedLLM | https://arxiv.org/pdf/2308.14346.pdf | https://huggingface.co/datasets/Flmc/DISC-Med-SFT | 226 | | HuatuoGPT-sft-data-v1 | ZH | Medical | HG & MC | https://github.com/FreedomIntelligence/HuatuoGPT | https://arxiv.org/pdf/2305.15075.pdf | https://huggingface.co/datasets/FreedomIntelligence/HuatuoGPT-sft-data-v1 | 227 | | Huatuo-26M | ZH | Medical | CI | https://github.com/FreedomIntelligence/Huatuo-26M | https://arxiv.org/pdf/2305.01526.pdf | | 228 | | MedDialog | EN & ZH | Medical | HG | https://github.com/UCSD-AI4H/Medical-Dialogue-System | https://aclanthology.org/2020.emnlp-main.743.pdf | | 229 | | Medical Meadow | EN | Medical | HG & CI | https://github.com/kbressem/medAlpaca | https://arxiv.org/pdf/2304.08247.pdf | https://huggingface.co/medalpaca | 230 | | Medical-sft | EN & ZH | Medical | CI | https://github.com/shibing624/MedicalGPT | https://huggingface.co/datasets/shibing624/medical | | 231 | | QiZhenGPT-sft-20k | ZH | Medical | CI | https://github.com/CMKRG/QiZhenGPT | | | 232 | | ShenNong_TCM_Dataset | ZH | Medical | MC | https://github.com/michael-wzhu/ShenNong-TCM-LLM | | https://huggingface.co/datasets/michaelwzhu/ShenNong_TCM_Dataset | 233 | | Code_Alpaca_20K | EN & PL | Code | MC | https://github.com/sahil280114/codealpaca | | | 234 | | CodeContest | EN & PL | Code | CI | https://github.com/google-deepmind/code_contests | https://arxiv.org/pdf/2203.07814.pdf | | 235 | | CommitPackFT | EN & PL (277) | Code | HG | https://github.com/bigcode-project/octopack | https://arxiv.org/pdf/2308.07124.pdf | https://huggingface.co/datasets/bigcode/commitpackft | 236 | | ToolAlpaca | EN & PL | Code | HG & MC | https://github.com/tangqiaoyu/ToolAlpaca | https://arxiv.org/pdf/2306.05301.pdf | | 237 | | ToolBench | EN & PL | Code | HG & MC | https://github.com/OpenBMB/ToolBench | https://arxiv.org/pdf/2307.16789v2.pdf | | 238 | | DISC-Law-SFT | ZH | Law | HG & CI & MC | https://github.com/FudanDISC/DISC-LawLLM | https://arxiv.org/pdf/2309.11325.pdf | | 239 | | HanFei 1.0 | ZH | Law | - | https://github.com/siat-nlp/HanFei | | | 240 | | LawGPT_zh | ZH | Law | CI & MC | https://github.com/LiuHC0428/LAW-GPT | | | 241 | | Lawyer LLaMA_sft | ZH | Law | CI & MC | https://github.com/AndrewZhe/lawyer-llama | https://arxiv.org/pdf/2305.15062.pdf | https://github.com/AndrewZhe/lawyer-llama/tree/main/data | 242 | | BELLE_School_Math | ZH | Math | MC | https://github.com/LianjiaTech/BELLE/tree/main/data/10M | | https://huggingface.co/datasets/BelleGroup/school_math_0.25M | 243 | | Goat | EN | Math | HG | https://github.com/liutiedong/goat | https://arxiv.org/pdf/2305.14201.pdf | https://huggingface.co/datasets/tiedong/goat | 244 | | MWP | EN & ZH | Math | CI | https://github.com/LYH-YF/MWPToolkit | https://browse.arxiv.org/pdf/2109.00799.pdf | https://huggingface.co/datasets/Macropodus/MWP-Instruct | 245 | | OpenMathInstruct-1 | EN | Math | CI & MC | https://github.com/Kipok/NeMo-Skills | https://arxiv.org/abs/2402.10176 | https://huggingface.co/datasets/nvidia/OpenMathInstruct-1 | 246 | | Child_chat_data | ZH | Education | HG & MC | https://github.com/HIT-SCIR-SC/QiaoBan | | | 247 | | Educhat-sft-002-data-osm | EN & ZH | Education | CI | https://github.com/icalk-nlp/EduChat | https://arxiv.org/pdf/2308.02773.pdf | https://huggingface.co/datasets/ecnu-icalk/educhat-sft-002-data-osm | 248 | | TaoLi_data | ZH | Education | HG & CI | https://github.com/blcuicall/taoli | | | 249 | | DISC-Fin-SFT | ZH | Financial | HG & CI & MC | https://github.com/FudanDISC/DISC-FinLLM | http://arxiv.org/abs/2310.15205 | | 250 | | AlphaFin | EN & ZH | Financial | HG & CI & MC | https://github.com/AlphaFin-proj/AlphaFin | https://arxiv.org/abs/2403.12582 | https://huggingface.co/datasets/AlphaFin/AlphaFin-dataset-v1 | 251 | | GeoSignal | EN | Geoscience | HG & CI & MC | https://github.com/davendw49/k2 | https://arxiv.org/pdf/2306.05064.pdf | https://huggingface.co/datasets/daven3/geosignal | 252 | | MeChat | ZH | Mental Health | CI & MC | https://github.com/qiuhuachuan/smile | https://arxiv.org/pdf/2305.00450.pdf | https://github.com/qiuhuachuan/smile/tree/main/data | 253 | | Mol-Instructions | EN | Biology | HG & CI & MC | https://github.com/zjunlp/Mol-Instructions | https://arxiv.org/pdf/2306.08018.pdf | https://huggingface.co/datasets/zjunlp/Mol-Instructions | 254 | | Owl-Instruction | EN & ZH | IT | HG & MC | https://github.com/HC-Guo/Owl | https://arxiv.org/pdf/2309.09298.pdf | | 255 | | PROSOCIALDIALOG | EN | Social Norms | HG & MC | | https://arxiv.org/pdf/2205.12688.pdf | https://huggingface.co/datasets/allenai/prosocial-dialog | 256 | | TransGPT-sft | ZH | Transportation | HG | https://github.com/DUOMO/TransGPT | | https://huggingface.co/datasets/DUOMO-Lab/TransGPT-sft | 257 | 258 | Multimodal SFT Datasets: 259 | 260 | | Model Name | Modality | Link | 261 | | --------------------------- | ----------------- | ------------------------------------------------------------ | 262 | | LRV-Instruction | Image | https://huggingface.co/datasets/VictorSanh/LrvInstruction?row=0 | 263 | | Clotho-Detail | Audio | https://github.com/magic-research/bubogpt/blob/main/dataset/README.md#audio-dataset-instruction | 264 | | CogVLM-SFT-311K | Image | https://huggingface.co/datasets/THUDM/CogVLM-SFT-311K | 265 | | ComVint | Image | https://drive.google.com/file/d/1eH5t8YoI2CGR2dTqZO0ETWpBukjcZWsd/view | 266 | | DataEngine-InstData | Image | https://opendatalab.com/OpenDataLab/DataEngine-InstData | 267 | | GranD_f | Image | https://huggingface.co/datasets/MBZUAI/GranD-f/tree/main | 268 | | LLaVA | Image | https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K | 269 | | LLaVA-1.5 | Image | https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json | 270 | | LVLM_NLF | Image | https://huggingface.co/datasets/YangyiYY/LVLM_NLF/tree/main | 271 | | M3IT | Image | https://huggingface.co/datasets/MMInstruction/M3IT | 272 | | MMC-Instruction Dataset | Image | https://github.com/FuxiaoLiu/MMC/blob/main/README.md | 273 | | MiniGPT-4 | Image | https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view | 274 | | MiniGPT-v2 | Image | https://github.com/Vision-CAIR/MiniGPT-4/blob/main/dataset/README_MINIGPTv2_FINETUNE.md | 275 | | PVIT | Image | https://huggingface.co/datasets/PVIT/pvit_data_stage2/tree/main | 276 | | PointLLM Instruction data | 3D | https://huggingface.co/datasets/RunsenXu/PointLLM/tree/main | 277 | | ShareGPT4V | Image | https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/tree/main | 278 | | Shikra-RD | Image | https://drive.google.com/file/d/1CNLu1zJKPtliQEYCZlZ8ykH00ppInnyN/view | 279 | | SparklesDialogue | Image | https://github.com/HYPJUDY/Sparkles/tree/main/dataset | 280 | | T2M | Image,Video,Audio | https://github.com/NExT-GPT/NExT-GPT/tree/main/data/IT_data/T-T+X_data | 281 | | TextBind | Image | https://drive.google.com/drive/folders/1-SkzQRInSfrVyZeB0EZJzpCPXXwHb27W | 282 | | TextMonkey | Image | https://www.modelscope.cn/datasets/lvskiller/TextMonkey_data/files | 283 | | VGGSS-Instruction | Image,Audio | https://bubo-gpt.github.io/ | 284 | | VIGC-InstData | Image | https://opendatalab.com/OpenDataLab/VIGC-InstData | 285 | | VILA | Image | https://github.com/Efficient-Large-Model/VILA/tree/main/data_prepare | 286 | | VLSafe | Image | https://arxiv.org/abs/2312.07533 | 287 | | Video-ChatGPT-video-it-data | Video | https://github.com/mbzuai-oryx/Video-ChatGPT | 288 | | VideoChat-video-it-data | Video | https://github.com/OpenGVLab/InternVideo/tree/main/Data/instruction_data | 289 | | X-InstructBLIP-it-data | Image,Video,Audio | https://github.com/salesforce/LAVIS/tree/main/projects/xinstructblip | 290 | 291 | 292 | 293 | ## Data-centric pretraining 294 | 295 | ### Domain mixture 296 | 297 | 1. **Doremi: Optimizing data mixtures speeds up language model pretraining** - [paper](https://arxiv.org/abs/2305.10429) 298 | 2. **Data selection for language models via importance resampling** - [paper](https://arxiv.org/abs/2302.03169) 299 | 3. **Glam: Efficient scaling of language models with mixture-of-experts** - [paper](https://arxiv.org/abs/2112.06905) 300 | 4. **Videollm: Modeling video sequence with large language models** - [paper](https://arxiv.org/abs/2305.13292) 301 | 5. **Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset** - [paper](https://arxiv.org/abs/2305.18500) 302 | 6. **Moviechat: From dense token to sparse memory for long video understanding** - [paper](https://arxiv.org/abs/2307.16449) 303 | 7. **Internvid: A large-scale video-text dataset for multimodal understanding and generation** - [paper](https://arxiv.org/abs/2307.06942) 304 | 8. **Youku-mplug: A 10 million large-scale chinese video-language dataset for pre-training and benchmarks** - [paper](https://arxiv.org/abs/2306.04362) 305 | 306 | 307 | 308 | ### Modality Mixture 309 | 310 | 1. **MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training** - [paper](https://arxiv.org/abs/2403.09611) 311 | 2. **From scarcity to efficiency: Improving clip training via visual-enriched captions** - [paper](https://arxiv.org/abs/2310.07699v1) 312 | 3. **Valor: Vision-audio-language omni-perception pretraining model and dataset** - [paper](https://arxiv.org/abs/2304.08345) 313 | 4. **AutoAD:Moviedescription in context** - [paper](https://arxiv.org/abs/2303.16899) 314 | 5. **Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding** - [paper](https://arxiv.org/abs/2311.08046) 315 | 6. **VideoChat: Chat-Centric Video Understanding** - [paper](https://arxiv.org/abs/2305.06355) 316 | 7. **Mvbench: A comprehensive multi-modal video understanding benchmark** - [paper](https://arxiv.org/abs/2311.17005) 317 | 8. **LLaMA-VID: An image is worth 2 tokens in large language models** - [paper](https://arxiv.org/abs/2311.17043) 318 | 9. **Video-llava:Learningunitedvisualrepresentation by alignment before projection** - [paper](https://arxiv.org/abs/2311.10122) 319 | 10. **Valley: Video assistant with large language model enhanced ability** - [paper](https://arxiv.org/abs/2306.07207) 320 | 11. **Video-llama: An instruction-tuned audio-visual language model for video understanding** - [paper](https://arxiv.org/abs/2306.02858) 321 | 12. **Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration** - [paper](https://arxiv.org/abs/2306.09093) 322 | 13. **Audio-Visual LLM for Video Understanding** - [paper](https://arxiv.org/abs/2312.06720) 323 | 14. **Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models** - [paper](https://arxiv.org/abs/2310.05863) 324 | 325 | 326 | 327 | ### Quality Selection 328 | 329 | 1. **DataComp: In search of the next generation of multimodal datasets** - [paper](https://arxiv.org/abs/2304.14108) 330 | 2. **Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters** - [paper](https://arxiv.org/abs/2403.02677) 331 | 3. **CiT: Curation in Training for Effective Vision-Language Data** - [paper](https://arxiv.org/abs/2301.02241) 332 | 4. **Sieve: Multimodal Dataset Pruning Using Image Captioning Models** - [paper](https://arxiv.org/abs/2310.02110) 333 | 5. **Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning** - [paper](https://arxiv.org/abs/2402.02055) 334 | 335 | 336 | 337 | ## Data-centric adaptation 338 | 339 | ### Data-Centric Supervised Finetuning 340 | 341 | 1. **Unnatural instructions: Tuning language models with (almost) no human labor** - [paper](https://arxiv.org/abs/2212.09689) 342 | 2. **Active Learning for Convolutional Neural Networks: A Core-Set Approach** - [paper](https://arxiv.org/abs/1708.00489) 343 | 3. **Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning** - [paper](https://openreview.net/forum?id=7D5EECbOaf9) 344 | 4. **Similar: Submodular information measures based active learning in realistic scenarios** - [paper](https://arxiv.org/abs/2107.00717) 345 | 5. **Practical coreset constructions for machine learning** - [paper](https://arxiv.org/abs/1703.06476) 346 | 6. **Deep learning on a data diet: Finding important examples early in training** - [paper](https://arxiv.org/abs/2107.07075) 347 | 7. **A new active labeling method for deep learning** - [paper](https://ieeexplore.ieee.org/document/6889457) 348 | 8. **Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning** - [paper](https://arxiv.org/abs/2305.09246) 349 | 9. **DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection** - [paper](https://arxiv.org/abs/2310.16776) 350 | 10. **Beyond neural scaling laws: beating power law scaling via data pruning** - [paper](https://arxiv.org/abs/2206.14486) 351 | 11. **Mods: Model-oriented data selection for instruction tuning**. - [paper](https://arxiv.org/abs/2311.15653) 352 | 12. **DeBERTa: Decoding-enhanced BERT with Disentangled Attention** - [paper](https://arxiv.org/abs/2006.03654) 353 | 13. **Alpagasus: Training a better alpaca with fewer data** - [paper](https://arxiv.org/abs/2307.08701) 354 | 14. **Rethinking the Instruction Quality: LIFT is What You Need** - [paper](https://arxiv.org/abs/2312.11508) 355 | 15. **What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning** - [paper](https://arxiv.org/abs/2312.15685) 356 | 16. **InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models** - [paper](https://arxiv.org/abs/2308.07074) 357 | 17. **SelectLLM: Can LLMs Select Important Instructions to Annotate?** - [paper](https://arxiv.org/abs/2401.16553) 358 | 18. **Improved Baselines with Visual Instruction Tuning** - [paper](https://arxiv.org/abs/2310.03744) 359 | 19. **NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks** - [paper](https://arxiv.org/abs/2306.03208) 360 | 20. **LESS: Selecting Influential Data for Targeted Instruction Tuning** - [paper](https://arxiv.org/abs/2402.04333) 361 | 21. **From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning** - [paper](https://arxiv.org/abs/2308.12032) 362 | 22. **One shot learning as instruction data prospector for large language models** - [paper](https://arxiv.org/abs/2312.10302) 363 | 23. **Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks** - [paper](https://arxiv.org/abs/2311.00288) 364 | 24. **SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection** - [paper](https://arxiv.org/abs/2402.16705) 365 | 366 | 367 | 368 | ### Data-Centric Human Preference Alignment 369 | 370 | 1. **Training language models to follow instructions with human feedback** - [paper](https://arxiv.org/abs/2203.02155) 371 | 2. **LLaMA-VID: An image is worth 2 tokens in large language models** - [paper](https://arxiv.org/abs/2311.17043) 372 | 3. **Aligning large multimodal models with factually augmented rlhf** - [paper]() 373 | 4. **Dress: Instructing large vision-language models to align and interact with humans via natural language feedback** - [paper](https://arxiv.org/abs/2311.10081) 374 | 375 | 376 | 377 | ## Evaluation 378 | 379 | 1. **Gans trained by a two time-scale update rule converge to a local nash equilibrium** - [paper](https://arxiv.org/abs/1706.08500) 380 | 2. **Assessing generative models via precision and recall** - [paper](https://arxiv.org/abs/1806.00035) 381 | 3. **Unsupervised Quality Estimation for Neural Machine Translation** - [paper](https://arxiv.org/abs/2005.10608) 382 | 4. **Mixture models for diverse machine translation: Tricks of the trade** - [paper](https://arxiv.org/abs/1902.07816) 383 | 5. **The vendi score: A diversity evaluation metric for machine learning** - [paper](https://arxiv.org/abs/2210.02410) 384 | 6. **Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning** - [paper](https://arxiv.org/abs/2310.12952) 385 | 7. **Navigating text-to-image customization: From lycoris fine-tuning to model evaluation** - [paper](https://arxiv.org/abs/2309.14859) 386 | 8. **TRUE: Re-evaluating factual consistency evaluation** - [paper](https://arxiv.org/abs/2204.04991) 387 | 9. **Object hallucination in image captioning** - [paper](https://arxiv.org/abs/1809.02156) 388 | 10. **Faithscore: Evaluating hallucinations in large vision-language models** - [paper](https://arxiv.org/abs/2311.01477) 389 | 11. **Deep coral: Correlation alignment for deep domain adaptation** - [paper](https://arxiv.org/abs/1607.01719) 390 | 12. **Transferability in deep learning: A survey** - [paper](https://arxiv.org/abs/2201.05867) 391 | 13. **Mauve scores for generative models: Theory and practice** - [paper](https://arxiv.org/abs/2212.14578) 392 | 14. **Translating Videos to Natural Language Using Deep Recurrent Neural Networks** - [paper](https://arxiv.org/abs/1412.4729) 393 | 394 | 395 | 396 | ### Evaluation Datasets: 397 | 398 | | Dataset | Modality | Type | Link | 399 | | --------------------------- | -------- | ------------------------------------ | ------------------------------------------------------------ | 400 | | MMMU | Image | Caption and General VQA | [https://mmmu-benchmark.github.io](https://mmmu-benchmark.github.io/) | 401 | | MME | Image | Caption and General VQA | https://arxiv.org/abs/2306.13394 | 402 | | Nocaps | Image | Caption and General VQA | https://github.com/nocaps-org | 403 | | GQA | Image | Caption and General VQA | https://cs.stanford.edu/people/dorarad/gqa/about.html | 404 | | DVQA | Image | Caption and General VQA | https://github.com/kushalkafle/DVQA_dataset | 405 | | VSR | Image | Caption and General VQA | https://github.com/cambridgeltl/visual-spatial-reasoning?tab=readme-ov-file | 406 | | OKVQA | Image | Caption and General VQA | https://okvqa.allenai.org/ | 407 | | Vizwiz | Image | Caption and General VQA | https://vizwiz.org/ | 408 | | POPE | Image | Caption and General VQA | https://github.com/RUCAIBox/POPE | 409 | | TextVQA | Image | Text-Oriented VQA | https://textvqa.org/ | 410 | | DocVQA | Image | Text-Oriented VQA | https://www.docvqa.org/ | 411 | | ChartQA | Image | Text-Oriented VQA | https://github.com/vis-nlp/ChartQA | 412 | | AI2D | Image | Text-Oriented VQA | https://allenai.org/data/diagrams | 413 | | OCR-VQA | Image | Text-Oriented VQA | https://ocr-vqa.github.io/ | 414 | | ScienceQA | Image | Text-Oriented VQA | https://scienceqa.github.io/ | 415 | | MathV | Image | Text-Oriented VQA | https://mathvista.github.io/ | 416 | | MMVet | Image | Text-Oriented VQA | https://github.com/yuweihao/MM-Vet | 417 | | RefCOCO, RefCOCO+, RefCOCOg | Image | Referring Expression Comprehension | https://github.com/lichengunc/refer | 418 | | GRIT | Image | Referring Expression Comprehension | https://allenai.org/project/grit/home | 419 | | TouchStone | Image | Instruction Following | https://allenai.org/project/grit/home | 420 | | SEED-Bench | Image | Instruction Following | https://allenai.org/project/grit/home | 421 | | MME | Image | Instruction Following | https://allenai.org/project/grit/home | 422 | | LLaVAW | Image | Instruction Following | https://github.com/haotian-liu/LLaVA | 423 | | HM | Image | Other | https://ai.meta.com/blog/hateful-memes-challenge-and-data-set/ | 424 | | MMB | Image | Other | https://github.com/open-compass/MMBench | 425 | | MSVD | Video | Video question answering | https://paperswithcode.com/dataset/msvd | 426 | | MSRVTT | Video | Video question answering | https://paperswithcode.com/dataset/msr-vtt | 427 | | TGIF-QA | Video | Video question answering | https://paperswithcode.com/dataset/tgif-qa | 428 | | ActivityNet-QA | Video | Video question answering | https://paperswithcode.com/dataset/activitynet-qa | 429 | | LSMDC | Video | Video question answering | https://paperswithcode.com/dataset/lsmdc | 430 | | MoVQA | Video | Video question answering | https://arxiv.org/abs/2312.04817 | 431 | | DiDeMo | Video | Video captioning and Video retrieval | https://paperswithcode.com/dataset/didemo | 432 | | VATEX | Video | Video captioning and Video retrieval | https://eric-xw.github.io/vatex-website/about.html | 433 | | MVBench | Video | Other | https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 | 434 | | EgoSchema | Video | Other | https://egoschema.github.io/ | 435 | | VideoChatGPT | Video | Other | https://github.com/mbzuai-oryx/Video-ChatGPT/tree/main | 436 | | Charade-STA | Video | Other | https://github.com/jiyanggao/TALL | 437 | | QVHighlight | Video | Other | https://github.com/jayleicn/moment_detr/tree/main/data | 438 | | AudioCaps | Audio | Audio retrieval | https://audiocaps.github.io/ | 439 | | Clotho | Audio | Audio retrieval | https://zenodo.org/records/4743815 | 440 | | ClothoAQA | Audio | Audio question answering | https://zenodo.org/records/6473207 | 441 | | Audio-MusicAVQA | Audio | Audio question answering | https://gewu-lab.github.io/MUSIC-AVQA/ | 442 | --------------------------------------------------------------------------------