├── tasks.png
├── segmentation emerge.PNG
├── README.md
├── 2-Segmentation Emerge.md
├── 4-PIS.md
└── 3-GIS.md
/tasks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/tasks.png
--------------------------------------------------------------------------------
/segmentation emerge.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/segmentation emerge.PNG
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
2 | [](http://makeapullrequest.com)
3 |
4 |
5 |
Image Segmentation in Foundation Model Era: A Survey
6 |
7 |
8 | Tianfei Zhou
9 | ,
10 | Wang Xia
11 | ,
12 | Fei Zhang
13 | ,
14 | Boyu Chang
15 | ,
16 | Wenguan Wang
17 | ,
18 | Ye Yuan
19 | ,
20 | Ender Konukoglu
21 | ,
22 | Daniel Cremers
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 | This repository complies a collection of resources on image segmentation in foundation model era,
39 | and will be continuously updated to track developments in the field.
40 | Please feel free to submit a pull request if you find any work missing.
41 |
42 | ## 1. Introduction
43 | Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as
44 | evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary
45 | segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image
46 | segmentation or developing dedicated segmentation foundation models (e.g., SAM, SAM2). These approaches not only deliver
47 | superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context.
48 | However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions
49 | associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research
50 | centered around FM-driven image segmentation. We investigate two basic lines of research (as shown in the following figure) – generic image segmentation (i.e.,
51 | semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive
52 | segmentation, referring segmentation, few-shot segmentation) – by delineating their respective task settings, background concepts,
53 | and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable
54 | Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current
55 | research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research.
56 |
57 |
58 |
59 |
60 |
61 | ***
62 |
63 | ## 2. Segmentation Knowledge Emerges From FMs
64 | Given the emergency capabilities of LLMs, a natural question arises: *Do segmentation properties emerge from FMs?* The
65 | answer is **positive**, even for FMs not explicitly designed for
66 | segmentation, such as CLIP, DINO and Diffusion Models. This also unlocks a new frontier in image segmentation,
67 | i.e., **acquiring segmentation without any training.** The following figure illustrates how to approach this and shows some examples:
68 |
69 |
70 |
71 |
72 |
73 | - [2.1 Segmentation Emerges from CLIP](2-Segmentation%20Emerge.md#21-segmentation-emerges-from-clip)
74 | - [2.2 Segmentation Emerges from DMs](2-Segmentation%20Emerge.md#22-segmentation-emerges-from-dms)
75 | - [2.3 Segmentation Emerges from DINO](2-Segmentation%20Emerge.md#23-segmentation-emerges-from-dino)
76 |
77 | ***
78 |
79 | ## 3. Foundation Model based GIS
80 | - [3.1 Semantic Segmentation](3-GIS.md#31-semantic-segmentation)
81 | - [3.1.1 CLIP-based Solution](3-GIS.md#311-clip-based-solution)
82 | - [3.1.2 DM-based Solution](3-GIS.md#312-dm-based-solution)
83 | - [3.1.3 DINO-based Solution](3-GIS.md#313-dino-based-solution)
84 | - [3.1.4 SAM-based Solution](3-GIS.md#314-sam-based-solution)
85 | - [3.1.5 Composition of FMs](3-GIS.md#315-composition-of-fms)
86 | - [3.2 Instance Segmentation](3-GIS.md#32-instance-segmentation)
87 | - [3.2.1 CLIP-based Solution](3-GIS.md#321-clip-based-solution)
88 | - [3.2.2 DM-based Solution](3-GIS.md#322-dm-based-solution)
89 | - [3.2.3 DINO-based Solution](3-GIS.md#323-dino-based-solution)
90 | - [3.2.4 Composition of FMs](3-GIS.md#324-composition-of-fms)
91 | - [3.3 Panoptic Segmentation](3-GIS.md#33-panoptic-segmentation)
92 | - [3.3.1 CLIP-based Solution](3-GIS.md#331-clip-based-solution)
93 | - [3.3.2 DM-based Solution](3-GIS.md#332-dm-based-solution)
94 | - [3.3.3 DINO-based Solution](3-GIS.md#333-dino-based-solution)
95 | - [3.3.4 SAM-based Solution](3-GIS.md#334-sam-based-solution)
96 |
97 |
98 | ***
99 |
100 | ## 4. Foundation Model based PIS
101 | - [4.1 Interactive Segmentation](4-PIS.md#41-interactive-segmentation)
102 | - [4.1.1 SAM-based Solution](4-PIS.md#411-sam-based-solution)
103 | - [4.2 Referring Segmentation](4-PIS.md#42-referring-segmentation)
104 | - [4.2.1 CLIP-based Solution](4-PIS.md#421-clip-based-solution)
105 | - [4.2.2 DM-based Solution](4-PIS.md#422-dm-based-solution)
106 | - [4.2.3 LLMs/MLLMs-based Solution](4-PIS.md#423-llmsmllms-based-solution)
107 | - [4.2.4 Composition of FMs](4-PIS.md#424-composition-of-fms)
108 | - [4.3 Few-shot Segmentation](4-PIS.md#43-few-shot-segmentation)
109 | - [4.3.1 CLIP-based Solution](4-PIS.md#431-clip-based-solution)
110 | - [4.3.2 DM-based Solution](4-PIS.md#432-dm-based-solution)
111 | - [4.3.3 DINO-based Solution](4-PIS.md#433-dino-based-solution)
112 | - [4.3.4 SAM-based Solution](4-PIS.md#434-sam-based-solution)
113 | - [4.3.5 LLMs/MLLMs-based Solution](4-PIS.md#435-mllms-based-solution)
114 | - [4.3.6 In-Context Segmentation](4-PIS.md#436-in-context-segmentation)
115 | ## Citation
116 |
117 | If you find our survey and repository useful for your research, please consider citing our paper:
118 | ```bibtex
119 | @article{zhou2024SegFMSurvey
120 | title={Image Segmentation in Foundation Model Era: A Survey},
121 | author={Zhou, Tianfei and Xia, Wang and Zhang, Fei and Chang, Boyu and Wang, Wenguan and Yuan, Ye and Konukoglu, Ender and Cremers, Daniel},
122 | journal={arXiv preprint arXiv:2408.12957},
123 | year={2024},
124 | }
125 | ```
126 |
--------------------------------------------------------------------------------
/2-Segmentation Emerge.md:
--------------------------------------------------------------------------------
1 | ## 2.1 Segmentation Emerges from CLIP
2 | | Year | Publication | Paper Title | Project |
3 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:|
4 | | 2022 | ECCV | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf) | [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/) |
5 | | 2023 | arXiv | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653) | [Code](https://github.com/xmed-lab/CLIP_Surgery), - |
6 | | 2023 | ICCV | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/kahnchana/clippy), - |
7 | | 2024 | CVPR | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf) | [Code](https://github.com/WalBouss/GEM), - |
8 | | 2024 | ECCV | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18) | [Code](https://github.com/wangf3014/SCLIP), - |
9 | | 2025 | WACV | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181) | [Code](https://github.com/sinahmr/NACLIP), - |
10 |
11 | ## 2.2 Segmentation Emerges from DMs
12 | | Year | Publication | Paper Title | Project |
13 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------:|
14 | | 2023 | ACL | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf) | [Code](https://github.com/castorini/daam), - |
15 | | 2023 | arXiv | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773) | [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/) |
16 | | 2023 | arXiv | [Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion](https://arxiv.org/pdf/2309.01369v1) | -, - |
17 | | 2023 | NeurIPS | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf) | [Code](https://github.com/VinAIResearch/Dataset-Diffusion), - |
18 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | -, [Project](https://bcorrad.github.io/freesegdiff/) |
19 | | 2024 | arXiv | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194) | [Code](https://github.com/Valkyrja3607/MaskDiffusion), - |
20 | | 2024 | CVPR | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf) | [Code](https://github.com/vpulab/ovam), - |
21 | | 2024 | ECCV | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf) | -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/) |
22 |
23 |
24 | ## 2.3 Segmentation Emerges from DINO
25 | | Year | Publication | Paper Title | Project |
26 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:|
27 | | 2021 | arXiv | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf) | [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/) |
28 | | 2021 | ICCV | [Emerging Properties in Self-Supervised Vision Transformers](https://openaccess.thecvf.com/content/ICCV2021/papers/Caron_Emerging_Properties_in_Self-Supervised_Vision_Transformers_ICCV_2021_paper.pdf) | [Code](https://github.com/facebookresearch/dino), - |
29 | | 2021 | BMVC | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf) | [Code](https://github.com/valeoai/LOST), - |
30 | | 2022 | arXiv | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363) | [Code](https://github.com/wvangansbeke/MaskDistill), - |
31 | | 2022 | CVPR | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) |
32 | | 2023 | ICLR | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC) | [Code](https://github.com/zadaianchuk/comus), - |
33 | | 2024 | TMLR | [DINOv2: Learning Robust Visual Features without Supervision](https://openreview.net/pdf?id=a68SUt6zFt) | [Code](https://github.com/facebookresearch/dinov2), - |
34 |
--------------------------------------------------------------------------------
/4-PIS.md:
--------------------------------------------------------------------------------
1 | # 4. Foundation Model based PIS
2 | ***
3 | ## 4.1 Interactive Segmentation
4 | ### 4.1.1 SAM-based Solution
5 | | Year | Publication | Paper Title | Project |
6 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
7 | | 2023 | arXiv | [SAM on Medical Images: A Comprehensive Study on Three Prompt Modes](https://arxiv.org/pdf/2305.00035) | -,- |
8 | | 2023 | arXiv | [Customized segment anything model for medical image segmentation](https://arxiv.org/pdf/2304.13785) | [Code](https://github.com/hitachinsk/SAMed), - |
9 | | 2023 | arXiv | [Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation](https://arxiv.org/pdf/2304.12620) | [Code](https://github.com/SuperMedIntel/Medical-SAM-Adapter), - |
10 | | 2023 | MedIA | [Segment anything model for medical image analysis: an experimental study](https://www.sciencedirect.com/science/article/pii/S1361841523001780/pdfft?md5=398c81e13674a45f4a0b611f468b4ea8&pid=1-s2.0-S1361841523001780-main.pdf) | [Code](https://github.com/mazurowski-lab/segment-anything-medical-evaluation), - |
11 | | 2023 | MIDL | [SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model](https://arxiv.org/pdf/2304.05396) | -, - |
12 | | 2023 | MICCAI BrainLes | [Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars](https://arxiv.org/pdf/2308.14133) | -, - |
13 | | 2023 | MICCAI Society | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://link.springer.com/content/pdf/10.1007/978-3-031-47401-9_23.pdf?pdf=inline%20link) | -, - |
14 | | 2023 | NeurIPS | [Segment Anything in High Quality](https://dl.acm.org/doi/10.5555/3666122.3667425) | [Code](https://github.com/SysCV/SAM-HQ), - |
15 | | 2024 | Comput. Biol. Med | [Segment anything model for medical image segmentation: Current applications and future directions](https://www.sciencedirect.com/science/article/pii/S0010482524003226/pdfft?md5=68de9e9c773354807446ee39cc3b1cb0&pid=1-s2.0-S0010482524003226-main.pdf) | [Code](https://github.com/YichiZhang98/SAM4MIS), - |
16 | | 2024 | CVPR | [Graco: Granularity-controllable interactive segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_GraCo_Granularity-Controllable_Interactive_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/Zhao-Yian/GraCo), [Project](https://zhao-yian.github.io/GraCo) |
17 | | 2024 | ECCV | [Tokenize anything via prompting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06366.pdf) | [Code](https://github.com/baaivision/tokenize-anything), - |
18 | | 2024 | ECCV | [Open vocabulary sam: Segment and recognize twenty-thousand classes interactively](https://link.springer.com/content/pdf/10.1007/978-3-031-72775-7_24) | [Code](https://github.com/HarborYuan/ovsam), [Project](https://www.mmlab-ntu.com/project/ovsam/) |
19 | | 2024 | ECCV | [Semantic-sam: Segment and recognize anything at any granularity](https://arxiv.org/abs/2307.04767) | [Code](https://github.com/UX-Decoder/Semantic-SAM), - |
20 | | 2024 | MedIA | [3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation](https://www.sciencedirect.com/science/article/pii/S1361841524002494/pdfft?md5=69d14d00b8d36553854e02366ab6bb36&pid=1-s2.0-S1361841524002494-main.pdf) | [Code](https://github.com/med-air/3DSAM-adapter), - |
21 | | 2024 | Nat. Commun | [Segment anything in medical images](https://www.nature.com/articles/s41467-024-44824-z.pdf) | [Code](https://github.com/bowang-lab/MedSAM), - |
22 | | 2024 | Strahlenther Onkol | [The Segment Anything foundation model achieves favorable brain tumor auto-segmentation accuracy in MRI to support radiotherapy treatment planning](https://link.springer.com/content/pdf/10.1007/s00066-024-02313-8.pdf) | -, - |
23 |
24 |
25 | ## 4.2 Referring Segmentation
26 | ### 4.2.1 CLIP-based Solution
27 | | Year | Publication | Paper Title | Project |
28 | |------|:-----------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------:|
29 | | 2022 | arXiv | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725) | -, - |
30 | | 2022 | CVPR | [CRIS: CLIP-Driven Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_CRIS_CLIP-Driven_Referring_Image_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/DerrickWang005/CRIS.pytorch), - |
31 | | 2022 | CVPR | [Image Segmentation Using Text and Image Prompts](https://openaccess.thecvf.com/content/CVPR2022/papers/Luddecke_Image_Segmentation_Using_Text_and_Image_Prompts_CVPR_2022_paper.pdf) | [Code](https://github.com/timojl/clipseg), - |
32 | | 2023 | CVPR | [Zero-shot Referring Image Segmentation with Global-Local Context Features](https://openaccess.thecvf.com/content/CVPR2023/papers/Yu_Zero-Shot_Referring_Image_Segmentation_With_Global-Local_Context_Features_CVPR_2023_paper.pdf) | [Code](https://github.com/Seonghoon-Yu/Zero-shot-RIS), - |
33 | | 2023 | EMNLP | [Text Augmented Spatial-aware Zero-shot Referring Image Segmentation](https://aclanthology.org/2023.findings-emnlp.73.pdf) | [Code](https://github.com/suoych/TAS), - |
34 | | 2023 | ICCV | [ Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Bridging_Vision_and_Language_Encoders_Parameter-Efficient_Tuning_for_Referring_Image_ICCV_2023_paper.pdf) | [Code](https://github.com/kkakkkka/ETRIS), - |
35 | | 2023 | ICCV | [ Referring Image Segmentation Using Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Referring_Image_Segmentation_Using_Text_Supervision_ICCV_2023_paper.pdf) | [Code](https://github.com/fawnliu/TRIS), - |
36 | | 2023 | NeurIPS | [Text Promptable Surgical Instrument Segmentation with Vision-Language Models](https://papers.nips.cc/paper_files/paper/2023/file/5af741d487c5f0b08bfe56e11d1883e4-Paper-Conference.pdf) | [Code](https://github.com/franciszzj/TP-SIS), - |
37 | | 2024 | AAAI | [EAVL: Explicitly Align Vision and Language for Referring Image Segmentation](https://www.researchgate.net/profile/Yichen-Yan-9/publication/373263673_EAVL_Explicitly_Align_Vision_and_Language_for_Referring_Image_Segmentation/links/6617ad9d43f8df018dee471d/EAVL-Explicitly-Align-Vision-and-Language-for-Referring-Image-Segmentation.pdf) | -, - |
38 | | 2024 | ACL | [Extending CLIP’s Image-Text Alignment to Referring Image Segmentation](https://aclanthology.org/2024.naacl-long.258.pdf) | -, - |
39 | | 2024 | arXiv | [Improving Referring Image Segmentation using Vision-Aware Text Features](https://arxiv.org/pdf/2404.08590v1) | [Code](https://arxiv.org/pdf/2404.08590v1), [Project](https://nero1342.github.io/VATEX_RIS/) |
40 | | 2024 | CVPR | [Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Unveiling_Parts_Beyond_Objects_Towards_Finer-Granularity_Referring_Expression_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/Rubics-Xuan/MRES), [Project](https://rubics-xuan.github.io/MRES/) |
41 | | 2024 | ICLR | [BarLeRIa: An Efficient Tuning Framework for Referring Image Segmentation](https://openreview.net/pdf?id=wHLDHRkmEu) | [Code](https://github.com/NastrondAd/BarLeRIa), - |
42 |
43 | ### 4.2.2 DM-based Solution
44 | | Year | Publication | Paper Title | Project |
45 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:|
46 | | 2022 | arXiv | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org/pdf/2211.13224) | [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo) |
47 | | 2023 | arXiv | [Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models](https://arxiv.org/pdf/2308.16777) | [Code](https://github.com/kodenii/Ref-Diff), - |
48 | | 2024 | CVPR | [UniGS: Unified Representation for Image Generation and Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Qi_UniGS_Unified_Representation_for_Image_Generation_and_Segmentation_CVPR_2024_paper.pdf) | -, - |
49 | | 2023 | ICCV | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf) | [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/) |
50 | | 2023 | ICCV | [ LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://koutilya-pnvr.github.io/LD-ZNet/), [Project](https://koutilya-pnvr.github.io/LD-ZNet/) |
51 | | 2023 | IROS | [Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10341402) | -, - |
52 |
53 |
54 | ### 4.2.3 LLMs/MLLMs-based Solution
55 | | Year | Publication | Paper Title | Project |
56 | |------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------:|
57 | | 2023 | arXiv | [LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model](https://arxiv.org/pdf/2312.17240) | -, - |
58 | | 2023 | arXiv | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498) | [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/) |
59 | | 2024 | arXiv | [LaSagnA: Language-based Segmentation Assistant for Complex Queries](https://arxiv.org/pdf/2404.08506) | [Code](https://github.com/congvvc/LaSagnA), - |
60 | | 2024 | arXiv | [Empowering Segmentation Ability to Multi-modal Large Language Models](https://arxiv.org/pdf/2403.14141) | -, - |
61 | | 2024 | CVPR | [LISA: Reasoning Segmentation via Large Language Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Lai_LISA_Reasoning_Segmentation_via_Large_Language_Model_CVPR_2024_paper.pdf) | [Code](https://github.com/dvlab-research/LISA), - |
62 | | 2024 | CVPR | [PixelLM:Pixel Reasoning with Large Multimodal Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Ren_PixelLM_Pixel_Reasoning_with_Large_Multimodal_Model_CVPR_2024_paper.pdf) | [Code](https://github.com/MaverickRen/PixelLM), [Project](https://pixellm.github.io/) |
63 | | 2024 | CVPR | [GSVA: Generalized Segmentation via Multimodal Large Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_GSVA_Generalized_Segmentation_via_Multimodal_Large_Language_Models_CVPR_2024_paper.pdf) | [Code](https://github.com/LeapLabTHU/GSVA), - |
64 | | 2024 | CVPR | [Osprey: Pixel Understanding with Visual Instruction Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Osprey_Pixel_Understanding_with_Visual_Instruction_Tuning_CVPR_2024_paper.pdf) | [Code](https://github.com/CircleRadon/Osprey), - |
65 | | 2024 | CVPRW | [LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning](https://openaccess.thecvf.com/content/CVPR2024W/MMFM/papers/Wang_LLM-Seg_Bridging_Image_Segmentation_and_Large_Language_Model_Reasoning_CVPRW_2024_paper.pdf) | [Code](https://github.com/wangjunchi/LLMSeg), - |
66 |
67 |
68 | ### 4.2.4 Composition of FMs
69 | | Year | Publication | Paper Title | Project |
70 | |-------|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:|
71 | | 2022 | arXiv | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725) | -, - |
72 | | 2022 | CVPR | [LAVT: Language-Aware Vision Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_LAVT_Language-Aware_Vision_Transformer_for_Referring_Image_Segmentation_CVPR_2022_paper.pdf) | -, - |
73 | | 2022 | NeurIPS | [CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation](https://proceedings.neurips.cc/paper_files/paper/2022/file/5e773d319e310f1e4d695159484143b8-Paper-Conference.pdf) | -, - |
74 | | 2023 | AAAI | [ Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation](https://ojs.aaai.org/index.php/AAAI/article/download/25428/25200) | -, - |
75 | | 2023 | ACM MM | [CARIS: Context-Aware Referring Image Segmentation](https://web.archive.org/web/20231028201140id_/https://dl.acm.org/doi/pdf/10.1145/3581783.3612117) | [Code](https://github.com/lsa1997/CARIS), - | |
76 | | 2023 | arXiv | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498) | [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/) |
77 | | 2023 | arXiv | [Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration](https://arxiv.org/pdf/2305.12799) | [Code](https://github.com/Yuqifan1117/Labal-Anything-Pipeline), - |
78 | | 2023 | CVPR | [GRES: Generalized Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_GRES_Generalized_Referring_Expression_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/henghuiding/ReLA), [Project](https://henghuiding.github.io/GRES/) |
79 | | 2023 | CVPR | [PolyFormer: Referring Image Segmentation as Sequential Polygon Generation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_PolyFormer_Referring_Image_Segmentation_As_Sequential_Polygon_Generation_CVPR_2023_paper.pdf) | [Code](https://github.com/amazon-science/polygon-transformer), [Project](https://polyformer.github.io/) |
80 | | 2023 | ICCV | [Beyond One-to-One: Rethinking the Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_Beyond_One-to-One_Rethinking_the_Referring_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/toggle1995/RIS-DMMI), - |
81 | | 2023 | ICCV | [Shatter and Gather:Learning Referring Image Segmentation with Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Kim_Shatter_and_Gather_Learning_Referring_Image_Segmentation_with_Text_Supervision_ICCV_2023_paper.pdf) | [Code](https://github.com/kdwonn/SaG), [Project](https://southflame.github.io/sag/) |
82 | | 2023 | ICCV | [Segment Every Reference Object in Spatial and Temporal Spaces](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Segment_Every_Reference_Object_in_Spatial_and_Temporal_Spaces_ICCV_2023_paper.pdf) | [Code](https://github.com/FoundationVision/UniRef), - |
83 | | 2023 | TOMM | [Towards Complex-query Referring Image Segmentation: A Novel Benchmark](https://dl.acm.org/doi/pdf/10.1145/3701733) | [Code](https://github.com/lili0415/DuMoGa), - |
84 | | 2024 | ACM MM | [Deep Instruction Tuning for Segment Anything Model](https://dl.acm.org/doi/pdf/10.1145/3664647.3680571) | [Code](https://github.com/wysnzzzz/DIT), - |
85 | | 2024 | arXiv | [LLMBind: A Unified Modality-Task Integration Framework](https://arxiv.org/pdf/2402.14891) | [Code](https://github.com/PKU-YuanGroup/LLMBind), - |
86 | | 2024 | arXiv | [Training-Free Semantic Segmentation via LLM-Supervision](https://arxiv.org/pdf/2404.00701) | -, - |
87 | | 2024 | arXiv | [F-LMM: Grounding Frozen Large Multimodal Models](https://arxiv.org/pdf/2406.05821) | [Code](https://github.com/wusize/F-LMM), - |
88 | | 2024 | CVPR | [Mask Grounding for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chng_Mask_Grounding_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/yxchng/mask-grounding), - |
89 | | 2024 | CVPR | [LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shah_LQMFormer_Language-aware_Query_Mask_Transformer_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | - ,- |
90 | | 2024 | CVPR | [PerceptionGPT: Effectively Fusing Visual Perception into LLM](https://openaccess.thecvf.com/content/CVPR2024/papers/Pi_PerceptionGPT_Effectively_Fusing_Visual_Perception_into_LLM_CVPR_2024_paper.pdf) | [Code](https://github.com/pipilurj/perceptionGPT), - |
91 | | 2024 | CVPR | [Prompt-Driven Referring Image Segmentation with Instance Contrasting](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Prompt-Driven_Referring_Image_Segmentation_with_Instance_Contrasting_CVPR_2024_paper.pdf) | -, - |
92 | | 2024 | CVPR | [Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Curriculum_Point_Prompting_for_Weakly-Supervised_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | -, - |
93 |
94 | ## 4.3 Few-shot Segmentation
95 | ### 4.3.1 CLIP-based Solution
96 | | Year | Publication | Paper Title | Project |
97 | |------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:|
98 | | 2023 | arXiv | [PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning](https://arxiv.org/pdf/2308.12757) | -, - |
99 | | 2023 | CVPR | [ WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Jeong_WinCLIP_Zero-Few-Shot_Anomaly_Classification_and_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/caoyunkang/WinClip), - |
100 | | 2024 | arXiv | [Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation](https://arxiv.org/pdf/2405.13686) | -, - |
101 | | 2024 | CVPR | [Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Rethinking_Prior_Information_Generation_with_CLIP_for_Few-Shot_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/vangjin/PI-CLIP), - |
102 | | 2024 | CVPR | [Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Unlocking_the_Potential_of_Pre-trained_Vision_Transformers_for_Few-Shot_Semantic_CVPR_2024_paper.pdf) | [Code](https://github.com/ZiqinZhou66/FewSegwithRD.git), - |
103 | | 2024 | ICASSP | [Language-Guided Few-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447456) | -, - |
104 | | 2024 | ICASSP | [Weakly Supervised Few-Shot Segmentation Through Textual Prompt](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446831) | [Code](https://github.com/Joseph-Lee-V/Text-WS-FSS), - |
105 | | 2024 | TMM | [Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10418263) | -, - |
106 |
107 | ### 4.3.2 DM-based Solution
108 | | Year | Publication | Paper Title | Project |
109 | |------|:-----------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------:|
110 | | 2023 | arXiv | [DifFSS: Diffusion Model for Few-Shot Semantic Segmentation](https://arxiv.org/pdf/2307.00773) | [Code](https://github.com/TrinitialChan/DifFSS), - |
111 | | 2024 | AAAI | [MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation](https://ojs.aaai.org/index.php/AAAI/article/view/28068/28143) | [Code](https://github.com/minhquanlecs/MaskDiff), - |
112 | | 2024 | arXiv | [SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging](https://arxiv.org/pdf/2403.16578) | -, - |
113 | | 2024 | TCE | [Few-Shot Semantic Segmentation for Consumer Electronics: An Inter-Class Relation Mining Approach](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10460319) | -, - |
114 |
115 | ### 4.3.3 DINO-based Solution
116 | | Year | Publication | Paper Title | Project |
117 | |-------|:------------------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:|
118 | | 2023 | ICCV | [Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Kang_Distilling_Self-Supervised_Vision_Transformers_for_Weakly-Supervised_Few-Shot_Classification__Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/dahyun-kang/cst), - |
119 | | 2023 | NeurIPS R0-FoMo Workshop | [One-shot Localization and Segmentation of Medical Images with Foundation Models](https://arxiv.org/pdf/2310.18642) | -, - |
120 | | 2024 | arXiv | [A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models](https://arxiv.org/pdf/2401.11311) | [Code](https://github.com/RedaBensaidDS/Foundation_FewShot), - |
121 | | 2024 | ICRA | [Few-Shot Panoptic Segmentation With Foundation Models](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611624) | [Code](https://github.com/robot-learning-freiburg/SPINO), [Project](http://spino.cs.uni-freiburg.de/) |
122 |
123 | ### 4.3.4 SAM-based Solution
124 | | Year | Publication | Paper Title | Project |
125 | |-------|:-----------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:|
126 | | 2024 | arXiv | [Boosting few shot semantic segmentation via segment anything model](https://arxiv.org/pdf/2401.09826) | -, - |
127 | | 2024 | arXiv | [Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation](https://arxiv.org/pdf/2403.05433) | [Code](https://github.com/Zch0414/p2sam), - |
128 | | 2024 | CVPR | [VRP-SAM: SAMwithVisual Reference Prompt](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_VRP-SAM_SAM_with_Visual_Reference_Prompt_CVPR_2024_paper.pdf) | [Code](https://github.com/syp2ysy/VRP-SAM), - |
129 | | 2024 | CVPR | [APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/He_APSeg_Auto-Prompt_Network_for_Cross-Domain_Few-Shot_Semantic_Segmentation_CVPR_2024_paper.pdf) | -, - |
130 | | 2024 | ICLR | [Personalize Segment Anything Model with One Shot](https://openreview.net/pdf?id=6Gzkhoc6YS) | [Code](https://github.com/ZrrSkywalker/Personalize-SAM), - |
131 |
132 | ### 4.3.5 MLLMs-based Solution
133 | | Year | Publication | Paper Title | Project |
134 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------:|
135 | | 2023 | arXiv | [Few-Shot Classification & Segmentation Using Large Language Models Agent](https://arxiv.org/pdf/2311.12065) | -, - |
136 | | 2024 | CVPR | [Llafs: When large language models meet few-shot segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_LLaFS_When_Large_Language_Models_Meet_Few-Shot_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/lanyunzhu99/LLaFS), - |
137 |
138 | ### 4.3.6 In-Context Segmentation
139 | | Year | Publication | Paper Title | Project |
140 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------:|
141 | | 2022 | NeurIPS | [Visual Prompting via Image Inpainting](https://proceedings.neurips.cc/paper_files/paper/2022/file/9f09f316a3eaf59d9ced5ffaefe97e0f-Paper-Conference.pdf) | [Code](https://github.com/amirbar/visual_prompting), [Project](https://yossigandelsman.github.io/visual_prompt/) |
142 | | 2023 | arXiv | [Exploring Effective Factors for Improving Visual In-Context Learning](https://arxiv.org/pdf/2304.04748) | [Code](https://github.com/syp2ysy/prompt-SelF), - |
143 | | 2023 | CVPR | [Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Images_Speak_in_Images_A_Generalist_Painter_for_In-Context_Visual_CVPR_2023_paper.pdf) | [Code](https://github.com/baaivision/Painter), - |
144 | | 2023 | NeurIPS | [What Makes Good Examples for Visual In-Context Learning?](https://proceedings.neurips.cc/paper_files/paper/2023/file/398ae57ed4fda79d0781c65c926d667b-Paper-Conference.pdf) | [Code](https://github.com/ZhangYuanhan-AI/visual_prompt_retrieval), - |
145 | | 2023 | NeurIPS | [In-Context Learning Unlocked for Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/1b3750390ca8b931fb9ca988647940cb-Paper-Conference.pdf) | [Code](https://github.com/Zhendong-Wang/Prompt-Diffusion), [Project](https://zhendong-wang.github.io/prompt-diffusion.github.io/) |
146 | | 2024 | CVPR | [Sequential Modeling Enables Scalable Learning for Large Vision Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Bai_Sequential_Modeling_Enables_Scalable_Learning_for_Large_Vision_Models_CVPR_2024_paper.pdf) | [Code](https://github.com/ytongbai/LVM), - |
147 | | 2024 | CVPR | [Towards More Unified In-context Visual Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Sheng_Towards_More_Unified_In-context_Visual_Understanding_CVPR_2024_paper.pdf) | -, - |
148 | | 2024 | CVPR | [Tyche: Stochastic In-Context Learning for Medical Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Rakic_Tyche_Stochastic_In-Context_Learning_for_Medical_Image_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/mariannerakic/tyche), [Project](https://tyche.csail.mit.edu/) |
149 | | 2024 | ICLR | [Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching](https://openreview.net/pdf?id=yzRXdhk2he) | [Code](https://github.com/aim-uofa/Matcher), - |
150 | | 2024 | WACV | [Instruct Me More! Random Prompting for Visual In-Context Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Instruct_Me_More_Random_Prompting_for_Visual_In-Context_Learning_WACV_2024_paper.pdf) | [Code](https://github.com/Jackieam/InMeMo), - |
151 |
--------------------------------------------------------------------------------
/3-GIS.md:
--------------------------------------------------------------------------------
1 | # 3. Foundation Model based GIS
2 |
3 | ## 3.1 Semantic Segmentation
4 | ### 3.1.1 CLIP-based Solution
5 | | Year | Publication | Paper Title | Project |
6 | |:-----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------:|
7 | | 2022 | BMVC | [Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models](https://bmvc2022.mpi-inf.mpg.de/0045.pdf) | [Code](https://github.com/chaofanma/Fusioner), [Project](https://yyh-rain-song.github.io/Fusioner_webpage/) |
8 | | 2022 | CVPR | [DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting](https://openaccess.thecvf.com/content/CVPR2022/papers/Rao_DenseCLIP_Language-Guided_Dense_Prediction_With_Context-Aware_Prompting_CVPR_2022_paper.pdf) | [Code](https://github.com/raoyongming/DenseCLIP), [Project](https://denseclip.ivg-research.xyz/) |
9 | | 2022 | CVPR | [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_GroupViT_Semantic_Segmentation_Emerges_From_Text_Supervision_CVPR_2022_paper.pdf) | [Code](https://github.com/NVlabs/GroupViT), [Project](https://jerryxu.net/GroupViT/) |
10 | | 2022 | CVPR | [Decoupling Zero-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Ding_Decoupling_Zero-Shot_Semantic_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/dingjiansw101/ZegFormer), - |
11 | | 2022 | ECCV | [A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890725.pdf) | [Code](https://github.com/MendelXu/zsseg.baseline), - |
12 | | 2022 | ECCV | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf) | [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/) |
13 | | 2022 | ICLR | [Language-driven Semantic Segmentation](https://openreview.net/pdf?id=RriDjddCLN) | [Code](https://github.com/isl-org/lang-seg), - |
14 | | 2023 | arXiv | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653) | [Code](https://github.com/xmed-lab/CLIP_Surgery), - |
15 | | 2023 | arXiv | [ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts](https://arxiv.org/pdf/2301.12171) | [Code](https://github.com/cubeyoung/ZegOT), [Project](https://cubeyoung.github.io/zegot.github.io/) |
16 | | 2023 | arXiv | [CLIP is Also a Good Teacher: A New Training Framework for Inductive Zero-shot Semantic Segmentation](https://arxiv.org/pdf/2310.02296) | -, - |
17 | | 2023 | arXiv | [Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation](https://arxiv.org/pdf/2304.01114) | -, - |
18 | | 2023 | arXiv | [TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification](https://arxiv.org/pdf/2312.14149) | [Code](https://github.com/Qinying-Liu/TagAlign), [Project](https://qinying-liu.github.io/Tag-Align/) |
19 | | 2023 | arXiv | [CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2312.12359) | [Code](https://github.com/wysoczanska/clip_dinoiser), [Project](https://wysoczanska.github.io/CLIP_DINOiser/) |
20 | | 2023 | CVPR | [Probabilistic Prompt Learning for Dense Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Kwon_Probabilistic_Prompt_Learning_for_Dense_Prediction_CVPR_2023_paper.pdf) | -, - |
21 | | 2023 | CVPR | [ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_ZegCLIP_Towards_Adapting_CLIP_for_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/ZiqinZhou66/ZegCLIP), - |
22 | | 2023 | CVPR | [ Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Mukhoti_Open_Vocabulary_Semantic_Segmentation_With_Patch_Aligned_Contrastive_Learning_CVPR_2023_paper.pdf) | -, - |
23 | | 2023 | CVPR | [ Side Adapter Network for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Xu_Side_Adapter_Network_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/MendelXu/SAN), [Project](https://mendelxu.github.io/SAN/) |
24 | | 2023 | CVPR | [ CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_CLIP_Is_Also_an_Efficient_Segmenter_A_Text-Driven_Approach_for_CVPR_2023_paper.pdf) | [Code](https://github.com/linyq2117/CLIP-ES), - |
25 | | 2023 | CVPR | [Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP](https://openaccess.thecvf.com/content/CVPR2023/papers/Liang_Open-Vocabulary_Semantic_Segmentation_With_Mask-Adapted_CLIP_CVPR_2023_paper.pdf) | [Code](https://github.com/facebookresearch/ov-seg), [Project](https://jeff-liangf.github.io/projects/ovseg/) |
26 | | 2023 | CVPR | [CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/He_CLIP-S4_Language-Guided_Self-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | -, - |
27 | | 2023 | CVPR | [Delving into Shape-aware Zero-shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_Delving_Into_Shape-Aware_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/Liuxinyv/SAZS), - |
28 | | 2023 | CVPR | [A Simple Framework for Text-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Yi_A_Simple_Framework_for_Text-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/muyangyi/SimSeg), - |
29 | | 2023 | ICCV | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/kahnchana/clippy), - |
30 | | 2023 | ICCV | [Global Knowledge Calibration for Fast Open-Vocabulary Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Global_Knowledge_Calibration_for_Fast_Open-Vocabulary_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/yongliu20/GKC), - |
31 | | 2023 | ICCV | [Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Open-Vocabulary_Semantic_Segmentation_with_Decoupled_One-Pass_Network_ICCV_2023_paper.pdf) | [Code](https://github.com/CongHan0808/DeOP), [Project](https://conghan0808.github.io/DeOP/) |
32 | | 2023 | ICCV | [ Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Exploring_Open-Vocabulary_Semantic_Segmentation_from_CLIP_Vision_Encoder_Distillation_Only_ICCV_2023_paper.pdf) | [Code](https://github.com/facebookresearch/ZeroSeg), - |
33 | | 2023 | ICML | [SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation](https://proceedings.mlr.press/v202/luo23a/luo23a.pdf) | [Code](https://github.com/ArrowLuo/SegCLIP), |
34 | | 2023 | NeurIPS | [Learning Mask-aware CLIP Representations for Zero-Shot Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/6ffe484a646db13891bb6435ca39d667-Paper-Conference.pdf) | [Code](https://github.com/jiaosiyu1999/MAFT), - |
35 | | 2023 | NeurIPS | [Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/e95eb5206c867be843fbc14bbfe8c10e-Paper-Conference.pdf) | [Code](https://github.com/Ferenas/PGSeg), - |
36 | | 2024 | arXiv | [kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies](https://arxiv.org/pdf/2404.09447) | -, - |
37 | | 2024 | CVPR | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf) | [Code](https://github.com/WalBouss/GEM), - |
38 | | 2024 | CVPR | [ CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/cvlab-kaist/CAT-Seg), [Project](https://ku-cvlab.github.io/CAT-Seg/) |
39 | | 2024 | CVPR | [CLIP as RNN:Segment Countless Visual Concepts without Training Endeavor](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_CLIP_as_RNN_Segment_Countless_Visual_Concepts_without_Training_Endeavor_CVPR_2024_paper.pdf) | [Code](https://github.com/kevin-ssy/CLIP_as_RNN), [Project](https://torrvision.com/clip_as_rnn/) |
40 | | 2024 | ECCV | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18) | [Code](https://github.com/wangf3014/SCLIP), - |
41 | | 2024 | ECCV | [OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-72980-5_12) | [Code](https://github.com/cubeyoung/OTSeg), - |
42 | | 2024 | ECCV | [SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance](https://link.springer.com/content/pdf/10.1007/978-3-031-72933-1_15) | [Code](https://github.com/google-research/semivl/), - |
43 | | 2024 | ECCV | [SILC: Improving Vision Language Pretraining with Self-distillation](https://link.springer.com/content/pdf/10.1007/978-3-031-72664-4_3) |
44 | | 2024 | ICLR | [CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction](https://openreview.net/pdf?id=DjzvJCRsVf) | [Code](https://github.com/wusize/CLIPSelf), - |
45 | | 2024 | TCSVT | [Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10764736) | -, - |
46 | | 2025 | WACV | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181) | [Code](https://github.com/sinahmr/NACLIP), - |
47 |
48 | ### 3.1.2 DM-based Solution
49 | | Year | Publication | Paper Title | Project |
50 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------:|
51 | | 2022 | arXiv | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org//pdf/2211.13224v2) | [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo/) |
52 | | 2022 | ICLR | [Label-Efficient Semantic Segmentation with Diffusion Models](https://openreview.net/pdf?id=SlxSY2UZQT) | [Code](https://github.com/yandex-research/ddpm-segmentation), [Project](https://yandex-research.github.io/ddpm-segmentation/) |
53 | | 2022 | MIDL | [Diffusion Models for Implicit Image Segmentation Ensembles](https://proceedings.mlr.press/v172/wolleb22a/wolleb22a.pdf) | [Code](https://github.com/JuliaWolleb/Diffusion-based-Segmentation), - |
54 | | 2022 | SASHIMI | [Can Segmentation Models Be Trained with Fully Synthetically Generated Data?](https://link.springer.com/content/pdf/10.1007/978-3-031-16980-9_8.pdf) | -, - |
55 | | 2023 | ACL | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf) | [Code](https://github.com/castorini/daam), - |
56 | | 2023 | arXiv | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773) | [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/) |
57 | | 2023 | arXiv | [Harnessing Diffusion Models for Visual Perception with Meta Prompts](https://arxiv.org/pdf/2312.14733) | [Code](https://github.com/fudan-zvg/meta-prompts), - |
58 | | 2023 | arXiv | [EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model](https://arxiv.org/pdf/2310.12868) | [Code](https://github.com/NUBagciLab/DiffBoost), - |
59 | | 2023 | CVPR | [Ambiguous Medical Image Segmentation using Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/papers/Rahman_Ambiguous_Medical_Image_Segmentation_Using_Diffusion_Models_CVPR_2023_paper.pdf) | [Code](https://github.com/aimansnigdha/ambiguous-medical-image-segmentation-using-diffusion-models), [Project](https://aimansnigdha.github.io/cimd/) |
60 | | 2023 | ICCV | [DDP: Diffusion Model for Dense Visual Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_DDP_Diffusion_Model_for_Dense_Visual_Prediction_ICCV_2023_paper.pdf) | [Code](https://github.com/JiYuanFeng/DDP), [Project](https://github.com/JiYuanFeng/DDP) |
61 | | 2023 | ICCV | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf) | [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/) |
62 | | 2023 | ICCV | [LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](http://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/koutilya-pnvr/LD-ZNet), [Project](https://koutilya-pnvr.github.io/LD-ZNet/) |
63 | | 2023 | ICCV | [Stochastic Segmentation with Conditional Categorical Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Zbinden_Stochastic_Segmentation_with_Conditional_Categorical_Diffusion_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/LarsDoorenbos/ccdm-stochastic-segmentation), - |
64 | | 2023 | ICCV | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.pdf) | [Code](https://github.com/weijiawu/DiffuMask), [Project](https://weijiawu.github.io/DiffusionMask/) |
65 | | 2023 | MICCAI | [Diffusion-Based Data Augmentation for Nuclei Image Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-43993-3_57.pdf?pdf=inline%20link) | [Code](https://github.com/xinyiyu/Nudiff), - |
66 | | 2023 | NeurIPS | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf) | [Code](https://github.com/VinAIResearch/Dataset-Diffusion), - |
67 | | 2023 | NeurIPS | [Unsupervised Semantic Correspondence Using Stable Diffusion](https://proceedings.neurips.cc/paper_files/paper/2023/file/1a074a28c3a6f2056562d00649ae6416-Paper-Conference.pdf) | [Code](https://github.com/ubc-vision/LDM_correspondences), [Project](https://ubc-vision.github.io/LDM_correspondences/) |
68 | | 2023 | NeurIPS | [SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process](https://papers.nips.cc/paper_files/paper/2023/file/fc0cc55dca3d791c4a0bb2d8ddeefe4f-Paper-Conference.pdf) | [Code](https://github.com/MengyuWang826/SegRefiner), - |
69 | | 2023 | NeurIPS | [DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/ab6e7ad2354f350b451b5a8e14d04f51-Paper-Conference.pdf) | [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/) |
70 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | -, [Project](https://bcorrad.github.io/freesegdiff/) |
71 | | 2024 | arXiv | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194) | [Code](https://github.com/Valkyrja3607/MaskDiffusion), - |
72 | | 2024 | CVPR | [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Barsellotti_Training-Free_Open-Vocabulary_Segmentation_with_Offline_Diffusion-Augmented_Prototype_Generation_CVPR_2024_paper.pdf) | [Code](https://github.com/aimagelab/freeda), [Project](https://aimagelab.github.io/freeda/) |
73 | | 2024 | CVPR | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf) | [Code](https://github.com/vpulab/ovam), - |
74 | | 2024 | CVPR | [Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Diffuse_Attend_and_Segment_Unsupervised_Zero-Shot_Segmentation_using_Stable_Diffusion_CVPR_2024_paper.pdf) | [Code](https://github.com/google/diffseg), [Project](https://sites.google.com/view/diffseg/home) |
75 | | 2024 | CVPR | [Text-image Alignment for Diffusion-based Perception](https://openaccess.thecvf.com/content/CVPR2024/papers/Kondapaneni_Text-Image_Alignment_for_Diffusion-Based_Perception_CVPR_2024_paper.pdf) | [Code](https://github.com/damaggu/TADP), [Project](https://www.vision.caltech.edu/tadp/) |
76 | | 2024 | CVPRW | [ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation](http://mengtang.org/scribblegen_cvprw2024.pdf) | [Code](https://github.com/mengtang-lab/scribblegen), - |
77 | | 2024 | ECCV | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf) | -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/) |
78 | | 2024 | ECCV | [Do Text-Free Diffusion Models Learn Discriminative Visual Representations?](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07757-supp.pdf) | [Code](https://github.com/soumik-kanad/diffssl), - |
79 | | 2024 | IJCAI | [Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors](https://www.ijcai.org/proceedings/2024/0082.pdf) | -, - |
80 | | 2024 | IJCV | [MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation](https://arxiv.org//pdf/2309.13042.pdf) | [Code](https://github.com/Jiahao000/MosaicFusion), - |
81 |
82 |
83 | ### 3.1.3 DINO-based Solution
84 | | Year | Publication | Paper Title | Project |
85 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:|
86 | | 2021 | arXiv | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf) | [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/) |
87 | | 2021 | BMVC | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf) | [Code](https://github.com/valeoai/LOST), - |
88 | | 2022 | CVPR | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) |
89 | | 2022 | CVPR | [Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut](http://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Self-Supervised_Transformers_for_Unsupervised_Object_Discovery_Using_Normalized_Cut_CVPR_2022_paper.pdf) | [Code](https://github.com/YangtaoWANG95/TokenCut), [Project](https://www.m-psi.fr/Papers/TokenCut2022/) |
90 | | 2022 | CVPR | [Self-Supervised Learning of Object Parts for Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2022/papers/Ziegler_Self-Supervised_Learning_of_Object_Parts_for_Semantic_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/MkuuWaUjinga/leopart), - |
91 | | 2022 | ICLR | [STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414) | [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html) |
92 | | 2023 | CVPR | [Leveraging Hidden Positives for Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Seong_Leveraging_Hidden_Positives_for_Unsupervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/hynnsk/HP), - |
93 | | 2023 | ICLR | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC) | [Code](https://github.com/zadaianchuk/comus), - |
94 | | 2023 | TPAMI | [TokenCut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10224285) | [Code](https://github.com/YangtaoWANG95/TokenCut_video), [Project](https://www.m-psi.fr/Papers/TokenCut2022/) |
95 | | 2024 | CVPR | [ Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Sick_Unsupervised_Semantic_Segmentation_Through_Depth-Guided_Feature_Correlation_and_Sampling_CVPR_2024_paper.pdf) | [Code](https://github.com/leonsick/depthg), [Project](https://leonsick.github.io/depthg/) |
96 | | 2024 | CVPR | [ EAGLE:Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_EAGLE_Eigen_Aggregation_Learning_for_Object-Centric_Unsupervised_Semantic_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/dnwjddl/EAGLE), - |
97 |
98 | ### 3.1.4 SAM-based Solution
99 | | Year | Publication | Paper Title | Project |
100 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------:|
101 | | 2023 | arXiv | [Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation](https://arxiv.org/pdf/2305.05803) | -, - |
102 | | 2023 | arXiv | [Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models](https://arxiv.org//pdf/2310.13026.pdf) | -, - |
103 | | 2024 | CVPR | [ From SAMtoCAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kweon_From_SAM_to_CAMs_Exploring_Segment_Anything_Model_for_Weakly_CVPR_2024_paper.pdf) | [Code](https://github.com/sangrockEG/S2C), - |
104 |
105 | ### 3.1.5 Composition of FMs
106 | | Year | Publication | Paper Title | Project |
107 | |:-----:|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------:|
108 | | 2023 | ICCV | [Zero-guidance Segmentation Using Zero Segment Labels](https://openaccess.thecvf.com/content/ICCV2023/papers/Rewatbowornwong_Zero-guidance_Segmentation_Using_Zero_Segment_Labels_ICCV_2023_paper.pdf) | [Code](https://github.com/nessessence/ZeroGuidanceSeg), [Project](https://zero-guide-seg.github.io/) |
109 | | 2023 | MICCAI | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://arxiv.org/pdf/2308.07156) | -, - |
110 | | 2024 | arXiv | [TAG: Guidance-free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2403.11197) | [Code](https://github.com/Valkyrja3607/TAG), - |
111 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | [Code](https://bcorrad.github.io/freesegdiff/#), [Project](https://bcorrad.github.io/freesegdiff/) |
112 | | 2024 | CVPR | [Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Image-to-Image_Matching_via_Foundation_Models_A_New_Perspective_for_Open-Vocabulary_CVPR_2024_paper.pdf) | -, - |
113 |
114 | ## 3.2 Instance Segmentation
115 | ### 3.2.1 CLIP-based Solution
116 | | Year | Publication | Paper Title | Project |
117 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
118 | | 2023 | ICML | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729) | [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/) |
119 | | 2023 | ICCV | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html) | [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/) |
120 | | 2023 | NeurIps | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html) | [Code](https://github.com/bytedance/fc-clip), - |
121 | | 2023 | ICCV | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html) | [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/) |
122 | | 2023 | CVPR | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/) |
123 | | 2023 | CVPR | [Semantic-promoted debiasing and background disambiguation for zero-shot instance segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Semantic-Promoted_Debiasing_and_Background_Disambiguation_for_Zero-Shot_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/D2Zero), [Project](https://henghuiding.github.io/D2Zero/) |
124 |
125 | ### 3.2.2 DM-based Solution
126 | | Year | Publication | Paper Title | Project |
127 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
128 | | 2023 | arXiv | [Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation](https://link.springer.com/article/10.1007/s11263-024-02223-3) | [Code](https://github.com/Jiahao000/MosaicFusion), - |
129 | | 2022 | arXiv | [Dalle for detection: Language-driven compositional image synthesis for object detection](https://arxiv.org/abs/2206.09592) | [Code](https://github.com/gyhandy/Text2Image-for-Detection), - |
130 | | 2023 | NeurIps | [Datasetdm: Synthesizing data with perception annotations using diffusion models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/ab6e7ad2354f350b451b5a8e14d04f51-Abstract-Conference.html) | [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/) |
131 |
132 |
133 | ### 3.2.3 DINO-based Solution
134 | | Year | Publication | Paper Title | Project |
135 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
136 | | 2022 | arXiv | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363) | [Code](https://github.com/wvangansbeke/MaskDistill), - |
137 | | 2023 | CVPR | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/) |
138 | | 2023 | NeurIps | [HASSOD: Hierarchical Adaptive Self-Supervised Object Detection](https://proceedings.neurips.cc/paper_files/paper/2023/hash/b9ecf4d84999a61783c360c3782e801e-Abstract-Conference.html) | [Code](https://github.com/Shengcao-Cao/HASSOD), [Project](https://hassod-neurips23.github.io/) |
139 | | 2024 | CVPR | [CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers](https://openaccess.thecvf.com/content/CVPR2024/html/Arica_CuVLER_Enhanced_Unsupervised_Object_Discoveries_through_Exhaustive_Self-Supervised_Transformers_CVPR_2024_paper.html) | [Code](https://github.com/shahaf-arica/CuVLER), - |
140 |
141 | ### 3.2.4 Composition of FMs for Instance Segmentation
142 | | Year | Publication | Paper Title | Project |
143 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
144 | | 2023 | ICML | [X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion](https://proceedings.mlr.press/v202/zhao23f.html) | [Code](https://github.com/yoctta/XPaste), - |
145 | | 2024 | CVPR | [DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data](https://openaccess.thecvf.com/content/CVPR2024/html/Fan_DiverGen_Improving_Instance_Segmentation_by_Learning_Wider_Data_Distribution_with_CVPR_2024_paper.html) | [Code](https://github.com/aim-uofa/DiverGen), - |
146 | | 2024 | ICLR | [The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models](https://arxiv.org/abs/2404.11957) | [Code](https://github.com/chengshiest/zip-your-clip), - |
147 | | 2023 | NeurIPS | [Segment anything in high quality](https://proceedings.neurips.cc/paper_files/paper/2023/hash/5f828e38160f31935cfe9f67503ad17c-Abstract-Conference.html) | [Code](https://github.com/SysCV/sam-hq), - |
148 | | 2024 | arXiv | [Grounded sam: Assembling open-world models for diverse visual tasks](https://arxiv.org/abs/2401.14159) | [Code](https://github.com/IDEA-Research/Grounded-Segment-Anything), - |
149 |
150 | ## 3.3 Panoptic Segmentation
151 | ### 3.3.1 CLIP-based Solution
152 | | Year | Publication | Paper Title | Project |
153 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
154 | | 2023 | ICML | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729) | [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/) |
155 | | 2023 | ICCV | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html) | [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/) |
156 | | 2023 | NeurIps | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html) | [Code](https://github.com/bytedance/fc-clip), - |
157 | | 2023 | ICCV | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html) | [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/) |
158 | | 2023 | CVPR | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/) |
159 | | 2023 | CVPR | [Generalized Decoding for Pixel, Image, and Language](https://openaccess.thecvf.com/content/CVPR2023/html/Zou_Generalized_Decoding_for_Pixel_Image_and_Language_CVPR_2023_paper) | [Code](https://github.com/microsoft/X-Decoder), [Project](https://x-decoder-vl.github.io/) |
160 | | 2024 | arXiv | [Open-vocabulary segmentation with unpaired mask-text super vision](https://arxiv.org/abs/2402.08960) | [Code](https://github.com/DerrickWang005/Unpair-Seg.pytorch), - |
161 | | 2023 | CVPR | [FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Qin_FreeSeg_Unified_Universal_and_Open-Vocabulary_Image_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/bytedance/FreeSeg), - |
162 | | 2023 | NeurIps | [Dataseg: Taming a universal multi-dataset multi-task segmentation model](https://proceedings.neurips.cc/paper_files/paper/2023/hash/d4eed238cf5807c6b75face996302892-Abstract-Conference.html) | -, - |
163 | | 2024 | CVPR | [OMG-Seg: Is One Model Good Enough For All Segmentation?](https://openaccess.thecvf.com/content/CVPR2024/html/Li_OMG-Seg_Is_One_Model_Good_Enough_For_All_Segmentation_CVPR_2024_paper.html) | [Code](https://github.com/lxtgh/omg-seg), - |
164 |
165 | ### 3.3.2 DM-based Solution
166 | | Year | Publication | Paper Title | Project |
167 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
168 | | 2023 | CVPR | [Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/html/Xu_Open-Vocabulary_Panoptic_Segmentation_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.html) | [Code](https://github.com/nvlabs/odise), - |
169 | | 2023 | ICCV | [A Generalist Framework for Panoptic Segmentation of Images and Videos](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_A_Generalist_Framework_for_Panoptic_Segmentation_of_Images_and_Videos_ICCV_2023_paper.html) | [Code](https://github.com/google-research/pix2seq), - |
170 | | 2023 | ICLR | [Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning](https://arxiv.org/abs/2208.04202) | [Code](https://github.com/google-research/pix2seq), - |
171 | | 2023 | ICCV | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/html/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.html) | [Code](https://github.com/weijiawu/DiffusionMask), - |
172 | | 2024 | ECCV | [A simple latent diffusion approach for panoptic segmentation and mask inpainting](https://link.springer.com/chapter/10.1007/978-3-031-72633-0_5) | [Code](https://github.com/segments-ai/latent-diffusion-segmentation), - |
173 |
174 | ### 3.3.3 DINO-based Solution
175 | | Year | Publication | Paper Title | Project |
176 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
177 | | 2022 | ICLR | [Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414) | [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html) |
178 | | 2023 | CVPR | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/) |
179 | | 2024 | CVPR | [Unsupervised Universal Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/html/Niu_Unsupervised_Universal_Image_Segmentation_CVPR_2024_paper.html) | [Code](https://github.com/u2seg/U2Seg), [Project](https://u2seg.github.io/) |
180 | | 2019 | CVPR | [Panoptic feature pyramid networks](https://openaccess.thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.html) | [Code](https://github.com/Angzz/panoptic-fpn-gluon), - |
181 | | 2024 | arXiv | [A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation](https://arxiv.org/abs/2405.19035) | [Code](https://github.com/robot-learning-freiburg/PASTEL), - |
182 | | 2020 | CVPR | [Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation](https://openaccess.thecvf.com/content_CVPR_2020/html/Cheng_Panoptic-DeepLab_A_Simple_Strong_and_Fast_Baseline_for_Bottom-Up_Panoptic_CVPR_2020_paper.html) | [Code](https://github.com/bowenc0221/panoptic-deeplab), - |
183 |
184 | ### 3.3.4 SAM-based Solution
185 | | Year | Publication | Paper Title | Project |
186 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
187 | | 2024 | ECCV | [Semantic-SAM: Segment and Recognize Anything at Any Granularity](https://arxiv.org/abs/2307.04767) | [Code](https://github.com/UX-Decoder/Semantic-SAM), - |
188 | | 2023 | NeurIps | [Segment Everything Everywhere All at Once](https://proceedings.neurips.cc/paper_files/paper/2023/hash/3ef61f7e4afacf9a2c5b71c726172b86-Abstract-Conference.html) | [Code](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once), - |
189 | | 2014 | ECCV | [Microsoft coco: Common objects in context](https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48) | -, [Project](https://cocodataset.org/#home) |
190 | | 2017 | CVPR | [Scene Parsing Through ADE20K Dataset](https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html) | -, - |
191 | | 2015 | IJCV | [The pascal visual object classes challenge: A retrospective](https://link.springer.com/article/10.1007/s11263-014-0733-5) | -, - |
192 |
--------------------------------------------------------------------------------