├── README.md └── figs └── fig1-comparison.png /README.md: -------------------------------------------------------------------------------- 1 | # M3I Pre-training 2 | 3 | [](https://paperswithcode.com/sota/object-detection-on-coco?p=towards-all-in-one-pre-training-via) 4 | 5 | [](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=towards-all-in-one-pre-training-via) 6 | 7 | [](https://paperswithcode.com/sota/object-detection-on-lvis-v1-0-minival?p=towards-all-in-one-pre-training-via) 8 | 9 | [](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=towards-all-in-one-pre-training-via) [](https://paperswithcode.com/sota/image-classification-on-imagenet?p=towards-all-in-one-pre-training-via) 10 | 11 | This repository is an official implementation of CVPR 2023 paper [Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information](https://arxiv.org/abs/2211.09807). 12 | 13 | By [Weijie Su](https://scholar.google.com/citations?user=ECDe6IIAAAAJ&hl=en), [Xizhou Zhu](https://scholar.google.com/citations?user=02RXI00AAAAJ&hl=en), [Chenxin Tao](https://scholar.google.com/citations?user=sXHFIBkAAAAJ&hl=en), [Lewei Lu](https://scholar.google.com/citations?user=zdgKJXIAAAAJ&hl=en), [Bin Li](http://staff.ustc.edu.cn/~binli/), [Gao Huang](http://www.gaohuang.net/), [Yu Qiao](https://scholar.google.com/citations?user=gFtI-8QAAAAJ&hl=en), [Xiaogang Wang](https://scholar.google.com/citations?user=-B5JgjsAAAAJ&hl=en), [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en), [Jifeng Dai](https://jifengdai.org/). 14 | 15 | Code will be available. 16 | 17 | ## Introduction 18 | 19 | **M**aximizing **M**ulti-modal **M**utual **I**nformation Pre-training (**M3I Pre-training**), initially described in [arxiv](https://arxiv.org/abs/2211.09807), is a simple yet effective one-stage pre-training paradigm. It can integrate existing pre-training methods (supervised pre-training, weakly-supervised pre-training and self-supervised pre-training) under an unified mutual information perspective and maintain all desired properties through a single-stage pre-training. Notably, we successfully pre-train a 1B model ([InternImage-H](https://arxiv.org/abs/2211.05778)) with M3I Pre-training and achieve new record `65.4 mAP` on COCO detection test-dev, `62.5 mAP` on LVIS detection minival, and `62.9 mIoU` on ADE20k. 20 | 21 |
22 |
23 |