Image Segmentation in Foundation Model Era: A Survey

├── tasks.png
├── segmentation emerge.PNG
├── README.md
├── 2-Segmentation Emerge.md
├── 4-PIS.md
└── 3-GIS.md


/tasks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/tasks.png


--------------------------------------------------------------------------------
/segmentation emerge.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/segmentation emerge.PNG


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity)
  2 | [![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) 
  3 | 
  4 | <p align="center">
  5 |   <h1 align="center">Image Segmentation in Foundation Model Era: A Survey</h1>
  6 |       
  7 |   <p align="center">
  8 |     <a href="https://www.tfzhou.com/"><strong>Tianfei Zhou</strong></a>
  9 |     ,
 10 |     <a href=""><strong>Wang Xia</strong></a>
 11 |     ,
 12 |     <a href=""><strong>Fei Zhang</strong></a>
 13 |     ,
 14 |     <a href=""><strong> Boyu Chang</strong></a>
 15 |               ,
 16 |     <a href="https://sites.google.com/view/wenguanwang"><strong>Wenguan Wang</strong></a>
 17 |     ,
 18 |     <a href=""><strong>Ye Yuan</strong></a>
 19 |         ,
 20 |     <a href="https://scholar.google.com/citations?user=OeEMrhQAAAAJ&hl=en"><strong>Ender Konukoglu</strong></a>
 21 |             ,
 22 |     <a href="https://cvg.cit.tum.de/members/cremers"><strong>Daniel Cremers</strong></a>
 23 | 
 24 |    </p>
 25 |    
 26 |   <p align="center">
 27 |     <a href='https://arxiv.org/pdf/2408.12957'>
 28 |       <img src='https://img.shields.io/badge/Paper-PDF-green?style=flat&logo=arXiv&logoColor=green' alt='arXiv PDF'>
 29 |     </a>
 30 |     <a href='https://github.com/tfzhou/VS-Survey' style='padding-left: 0.5rem;'>
 31 |       <img src='https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=blue' alt='Project Page'>
 32 |     </a>
 33 |   </p>
 34 | </p>
 35 | <br />
 36 | 
 37 | 
 38 | This repository complies a collection of resources on image segmentation in foundation model era, 
 39 | and will be  continuously updated to track developments in the field. 
 40 | Please feel free to submit a pull request if you find any work missing.
 41 | 
 42 | ## 1. Introduction
 43 | Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as
 44 | evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary
 45 | segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image
 46 | segmentation or developing dedicated segmentation foundation models (e.g., SAM, SAM2). These approaches not only deliver
 47 | superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context.
 48 | However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions
 49 | associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research
 50 | centered around FM-driven image segmentation. We investigate two basic lines of research (as shown in the following figure) – generic image segmentation (i.e.,
 51 | semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive
 52 | segmentation, referring segmentation, few-shot segmentation) – by delineating their respective task settings, background concepts,
 53 | and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable
 54 | Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current
 55 | research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research. 
 56 | 
 57 | <p align="center">
 58 |   <img src="tasks.png" width="500">
 59 | </p>
 60 | 
 61 | ***
 62 | 
 63 | ## 2. Segmentation Knowledge Emerges From FMs
 64 | Given the emergency capabilities of LLMs, a natural question arises: *Do segmentation properties emerge from FMs?* The
 65 |  answer is **positive**, even for FMs not explicitly designed for
 66 |  segmentation, such as CLIP, DINO and Diffusion Models. This also unlocks a new frontier in image segmentation,
 67 |  i.e., **acquiring segmentation without any training.** The following figure illustrates how to approach this and shows some examples:
 68 | 
 69 | <p align="center">
 70 |   <img src="segmentation emerge.PNG" width="500">
 71 | </p>
 72 | 
 73 | - [2.1 Segmentation Emerges from CLIP](2-Segmentation%20Emerge.md#21-segmentation-emerges-from-clip)
 74 | - [2.2 Segmentation Emerges from DMs](2-Segmentation%20Emerge.md#22-segmentation-emerges-from-dms)
 75 | - [2.3 Segmentation Emerges from DINO](2-Segmentation%20Emerge.md#23-segmentation-emerges-from-dino)
 76 | 
 77 | ***
 78 | 
 79 | ## 3. Foundation Model based GIS
 80 | - [3.1 Semantic Segmentation](3-GIS.md#31-semantic-segmentation)
 81 |   - [3.1.1 CLIP-based Solution](3-GIS.md#311-clip-based-solution)
 82 |   - [3.1.2 DM-based Solution](3-GIS.md#312-dm-based-solution)
 83 |   - [3.1.3 DINO-based Solution](3-GIS.md#313-dino-based-solution)
 84 |   - [3.1.4 SAM-based Solution](3-GIS.md#314-sam-based-solution)
 85 |   - [3.1.5 Composition of FMs](3-GIS.md#315-composition-of-fms)
 86 | - [3.2 Instance Segmentation](3-GIS.md#32-instance-segmentation)
 87 |   - [3.2.1 CLIP-based Solution](3-GIS.md#321-clip-based-solution)
 88 |   - [3.2.2 DM-based Solution](3-GIS.md#322-dm-based-solution)
 89 |   - [3.2.3 DINO-based Solution](3-GIS.md#323-dino-based-solution)
 90 |   - [3.2.4 Composition of FMs](3-GIS.md#324-composition-of-fms)
 91 | - [3.3 Panoptic Segmentation](3-GIS.md#33-panoptic-segmentation)
 92 |   - [3.3.1 CLIP-based Solution](3-GIS.md#331-clip-based-solution)
 93 |   - [3.3.2 DM-based Solution](3-GIS.md#332-dm-based-solution)
 94 |   - [3.3.3 DINO-based Solution](3-GIS.md#333-dino-based-solution)
 95 |   - [3.3.4 SAM-based Solution](3-GIS.md#334-sam-based-solution)
 96 | 
 97 | 
 98 | ***
 99 | 
100 | ## 4. Foundation Model based PIS
101 | - [4.1 Interactive Segmentation](4-PIS.md#41-interactive-segmentation)
102 |   - [4.1.1 SAM-based Solution](4-PIS.md#411-sam-based-solution)
103 | - [4.2 Referring Segmentation](4-PIS.md#42-referring-segmentation)
104 |   - [4.2.1 CLIP-based Solution](4-PIS.md#421-clip-based-solution)
105 |   - [4.2.2 DM-based Solution](4-PIS.md#422-dm-based-solution)
106 |   - [4.2.3 LLMs/MLLMs-based Solution](4-PIS.md#423-llmsmllms-based-solution)
107 |   - [4.2.4 Composition of FMs](4-PIS.md#424-composition-of-fms)
108 | - [4.3 Few-shot Segmentation](4-PIS.md#43-few-shot-segmentation)
109 |   - [4.3.1 CLIP-based Solution](4-PIS.md#431-clip-based-solution)
110 |   - [4.3.2 DM-based Solution](4-PIS.md#432-dm-based-solution)
111 |   - [4.3.3 DINO-based Solution](4-PIS.md#433-dino-based-solution)
112 |   - [4.3.4 SAM-based Solution](4-PIS.md#434-sam-based-solution)
113 |   - [4.3.5 LLMs/MLLMs-based Solution](4-PIS.md#435-mllms-based-solution)
114 |   - [4.3.6 In-Context Segmentation](4-PIS.md#436-in-context-segmentation)
115 | ## Citation
116 | 
117 | If you find our survey and repository useful for your research, please consider citing our paper:
118 | ```bibtex
119 | @article{zhou2024SegFMSurvey
120 |     title={Image Segmentation in Foundation Model Era: A Survey},
121 |     author={Zhou, Tianfei and Xia, Wang and Zhang, Fei and Chang, Boyu and Wang, Wenguan and Yuan, Ye and Konukoglu, Ender and Cremers, Daniel},
122 |     journal={arXiv preprint arXiv:2408.12957},
123 |     year={2024},
124 | }
125 | ```
126 | 


--------------------------------------------------------------------------------
/2-Segmentation Emerge.md:
--------------------------------------------------------------------------------
 1 | ## 2.1 Segmentation Emerges from CLIP
 2 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                           |                                                 Project                                                 |
 3 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:|
 4 | | 2022 |    ECCV     | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf)                                                                                                                                                         | [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/) |
 5 | | 2023 |    arXiv    | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653)                                                                                                                                                    |                           [Code](https://github.com/xmed-lab/CLIP_Surgery), -                           |
 6 | | 2023 |    ICCV     | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf)                                                          |                             [Code](https://github.com/kahnchana/clippy), -                              | 
 7 | | 2024 |    CVPR     | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf) |                               [Code](https://github.com/WalBouss/GEM), -                                |
 8 | | 2024 |    ECCV     | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18)                                                                                                                                |                              [Code](https://github.com/wangf3014/SCLIP), -                              |
 9 | | 2025 |    WACV     | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181)                                                                                                                                             |                              [Code](https://github.com/sinahmr/NACLIP), -                               |
10 | 
11 | ## 2.2 Segmentation Emerges from DMs
12 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                          |                                                     Project                                                     |
13 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------:|
14 | | 2023 |     ACL     | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf)                                                                                                                                                                  |                                  [Code](https://github.com/castorini/daam), -                                   |
15 | | 2023 |    arXiv    | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773)                                                                                                                                                                   | [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/) |
16 | | 2023 |    arXiv    | [Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion](https://arxiv.org/pdf/2309.01369v1)                                                                                                                         |                                                      -, -                                                       |
17 | | 2023 |   NeurIPS   | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf)                                                            |                          [Code](https://github.com/VinAIResearch/Dataset-Diffusion), -                          |
18 | | 2024 |    arXiv    | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105)                                                                                                                                                                   |                              -, [Project](https://bcorrad.github.io/freesegdiff/)                               |
19 | | 2024 |    arXiv    | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194)                                                                                                                                                                 |                            [Code](https://github.com/Valkyrja3607/MaskDiffusion), -                             |
20 | | 2024 |    CVPR     | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf) |                                    [Code](https://github.com/vpulab/ovam), -                                    |
21 | | 2024 |    ECCV     | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf)                                                                                                                                                              |                         -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/)                         |
22 | 
23 | 
24 | ## 2.3 Segmentation Emerges from DINO
25 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                             |                                                               Project                                                               |
26 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:|
27 | | 2021 |    arXiv    | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf)                                                                                                                                                                                          |               [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/)                |
28 | | 2021 |    ICCV     | [Emerging Properties in Self-Supervised Vision Transformers](https://openaccess.thecvf.com/content/ICCV2021/papers/Caron_Emerging_Properties_in_Self-Supervised_Vision_Transformers_ICCV_2021_paper.pdf)                                                                                |                                         [Code](https://github.com/facebookresearch/dino), -                                         |
29 | | 2021 |    BMVC     | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf)                                                                                                                                                 |                                             [Code](https://github.com/valeoai/LOST), -                                              |
30 | | 2022 |    arXiv    | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363)                                                                                                                                                                   |                                       [Code](https://github.com/wvangansbeke/MaskDistill), -                                        |
31 | | 2022 |    CVPR     | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) |
32 | | 2023 |    ICLR     | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC)                                                                                                                                                     |                                           [Code](https://github.com/zadaianchuk/comus), -                                           |
33 | | 2024 |    TMLR     | [DINOv2: Learning Robust Visual Features without Supervision](https://openreview.net/pdf?id=a68SUt6zFt)                                                                                                                                                                                 |                                        [Code](https://github.com/facebookresearch/dinov2), -                                        |
34 | 


--------------------------------------------------------------------------------
/4-PIS.md:
--------------------------------------------------------------------------------
  1 | # 4. Foundation Model based PIS
  2 | ***
  3 | ## 4.1 Interactive Segmentation
  4 | ### 4.1.1 SAM-based Solution
  5 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
  6 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
  7 | | 2023  |       arXiv        | [SAM on Medical Images: A Comprehensive Study on Three Prompt Modes](https://arxiv.org/pdf/2305.00035)                                                                                                                                                   |                                               -,-                                                |
  8 | | 2023  |       arXiv        | [Customized segment anything model  for medical image segmentation](https://arxiv.org/pdf/2304.13785)                                                                                                                                                    |                          [Code](https://github.com/hitachinsk/SAMed), -                          |
  9 | | 2023  |       arXiv        | [Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation](https://arxiv.org/pdf/2304.12620)                                                                                                                                  |                 [Code](https://github.com/SuperMedIntel/Medical-SAM-Adapter), -                  |
 10 | | 2023  |       MedIA        | [Segment anything model for medical image analysis:  an experimental study](https://www.sciencedirect.com/science/article/pii/S1361841523001780/pdfft?md5=398c81e13674a45f4a0b611f468b4ea8&pid=1-s2.0-S1361841523001780-main.pdf)                        |         [Code](https://github.com/mazurowski-lab/segment-anything-medical-evaluation), -         |
 11 | | 2023  |        MIDL        | [SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model](https://arxiv.org/pdf/2304.05396)                                                                                                                              |                                               -, -                                               |
 12 | | 2023  |  MICCAI BrainLes   | [Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars](https://arxiv.org/pdf/2308.14133)                                                                                                                                       |                                               -, -                                               |
 13 | | 2023  |   MICCAI Society   | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://link.springer.com/content/pdf/10.1007/978-3-031-47401-9_23.pdf?pdf=inline%20link)                                                                   |                                               -, -                                               |
 14 | | 2023  |      NeurIPS       | [Segment Anything in High Quality](https://dl.acm.org/doi/10.5555/3666122.3667425)                                                                                                                                                                       |                            [Code](https://github.com/SysCV/SAM-HQ), -                            |
 15 | | 2024  | Comput. Biol. Med  | [Segment anything model for medical image segmentation: Current applications and future directions](https://www.sciencedirect.com/science/article/pii/S0010482524003226/pdfft?md5=68de9e9c773354807446ee39cc3b1cb0&pid=1-s2.0-S0010482524003226-main.pdf) |                        [Code](https://github.com/YichiZhang98/SAM4MIS), -                        |
 16 | | 2024  |        CVPR        | [Graco: Granularity-controllable interactive  segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_GraCo_Granularity-Controllable_Interactive_Segmentation_CVPR_2024_paper.pdf)                                                      |     [Code](https://github.com/Zhao-Yian/GraCo), [Project](https://zhao-yian.github.io/GraCo)     |
 17 | | 2024  |        ECCV        | [Tokenize anything via prompting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06366.pdf)                                                                                                                                                    |                    [Code](https://github.com/baaivision/tokenize-anything), -                    |
 18 | | 2024  |        ECCV        | [Open vocabulary sam: Segment and recognize twenty-thousand classes interactively](https://link.springer.com/content/pdf/10.1007/978-3-031-72775-7_24)                                                                                                   | [Code](https://github.com/HarborYuan/ovsam), [Project](https://www.mmlab-ntu.com/project/ovsam/) |
 19 | | 2024  |        ECCV        | [Semantic-sam: Segment and recognize anything at any granularity](https://arxiv.org/abs/2307.04767)                                                                                                                                                      |                      [Code](https://github.com/UX-Decoder/Semantic-SAM), -                       |
 20 | | 2024  |       MedIA        | [3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation](https://www.sciencedirect.com/science/article/pii/S1361841524002494/pdfft?md5=69d14d00b8d36553854e02366ab6bb36&pid=1-s2.0-S1361841524002494-main.pdf)        |                       [Code](https://github.com/med-air/3DSAM-adapter), -                        |
 21 | | 2024  |    Nat. Commun     | [Segment anything in medical images](https://www.nature.com/articles/s41467-024-44824-z.pdf)                                                                                                                                                             |                         [Code](https://github.com/bowang-lab/MedSAM), -                          |
 22 | | 2024  | Strahlenther Onkol | [The Segment Anything foundation model achieves favorable brain tumor auto-segmentation accuracy in MRI to support radiotherapy treatment planning](https://link.springer.com/content/pdf/10.1007/s00066-024-02313-8.pdf)                                |                                               -, -                                               |
 23 | 
 24 | 
 25 | ## 4.2 Referring Segmentation
 26 | ### 4.2.1 CLIP-based Solution
 27 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                                                                                    |                                           Project                                            |
 28 | |------|:-----------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------:|
 29 | | 2022 |    arXiv    | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725)                                                                                                                                                                                                                                                    |                                             -, -                                             |
 30 | | 2022 |    CVPR     | [CRIS: CLIP-Driven Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_CRIS_CLIP-Driven_Referring_Image_Segmentation_CVPR_2022_paper.pdf)                                                                                                                                                                 |                  [Code](https://github.com/DerrickWang005/CRIS.pytorch), -                   |
 31 | | 2022 |    CVPR     | [Image Segmentation Using Text and Image Prompts](https://openaccess.thecvf.com/content/CVPR2022/papers/Luddecke_Image_Segmentation_Using_Text_and_Image_Prompts_CVPR_2022_paper.pdf)                                                                                                                                                          |                         [Code](https://github.com/timojl/clipseg), -                         |
 32 | | 2023 |    CVPR     | [Zero-shot Referring Image Segmentation with Global-Local Context Features](https://openaccess.thecvf.com/content/CVPR2023/papers/Yu_Zero-Shot_Referring_Image_Segmentation_With_Global-Local_Context_Features_CVPR_2023_paper.pdf)                                                                                                            |                   [Code](https://github.com/Seonghoon-Yu/Zero-shot-RIS), -                   |
 33 | | 2023 |    EMNLP    | [Text Augmented Spatial-aware Zero-shot Referring Image Segmentation](https://aclanthology.org/2023.findings-emnlp.73.pdf)                                                                                                                                                                                                                     |                           [Code](https://github.com/suoych/TAS), -                           |
 34 | | 2023 |    ICCV     | [ Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Bridging_Vision_and_Language_Encoders_Parameter-Efficient_Tuning_for_Referring_Image_ICCV_2023_paper.pdf)                                                                       |                         [Code](https://github.com/kkakkkka/ETRIS), -                         |
 35 | | 2023 |    ICCV     | [ Referring Image Segmentation Using Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Referring_Image_Segmentation_Using_Text_Supervision_ICCV_2023_paper.pdf)                                                                                                                                                      |                          [Code](https://github.com/fawnliu/TRIS), -                          |
 36 | | 2023 |   NeurIPS   | [Text Promptable Surgical Instrument Segmentation with Vision-Language Models](https://papers.nips.cc/paper_files/paper/2023/file/5af741d487c5f0b08bfe56e11d1883e4-Paper-Conference.pdf)                                                                                                                                                       |                       [Code](https://github.com/franciszzj/TP-SIS), -                        |
 37 | | 2024 |    AAAI     | [EAVL: Explicitly Align Vision and Language for Referring Image Segmentation](https://www.researchgate.net/profile/Yichen-Yan-9/publication/373263673_EAVL_Explicitly_Align_Vision_and_Language_for_Referring_Image_Segmentation/links/6617ad9d43f8df018dee471d/EAVL-Explicitly-Align-Vision-and-Language-for-Referring-Image-Segmentation.pdf) |                                             -, -                                             |
 38 | | 2024 |     ACL     | [Extending CLIP’s Image-Text Alignment to Referring Image Segmentation](https://aclanthology.org/2024.naacl-long.258.pdf)                                                                                                                                                                                                                      |                                             -, -                                             |
 39 | | 2024 |    arXiv    | [Improving Referring Image Segmentation using Vision-Aware Text Features](https://arxiv.org/pdf/2404.08590v1)                                                                                                                                                                                                                                  | [Code](https://arxiv.org/pdf/2404.08590v1), [Project](https://nero1342.github.io/VATEX_RIS/) |
 40 | | 2024 |    CVPR     | [Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Unveiling_Parts_Beyond_Objects_Towards_Finer-Granularity_Referring_Expression_Segmentation_CVPR_2024_paper.pdf)                                                                       | [Code](https://github.com/Rubics-Xuan/MRES), [Project](https://rubics-xuan.github.io/MRES/)  |
 41 | | 2024 |    ICLR     | [BarLeRIa: An Efficient Tuning Framework for Referring Image Segmentation](https://openreview.net/pdf?id=wHLDHRkmEu)                                                                                                                                                                                                                           |                      [Code](https://github.com/NastrondAd/BarLeRIa), -                       |
 42 | 
 43 | ### 4.2.2 DM-based Solution
 44 | | Year  | Publication | Paper Title                                                                                                                                                                                                                    |                                                Project                                                |
 45 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:|
 46 | | 2022  |    arXiv    | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org/pdf/2211.13224)                                                                                                                          | [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo)  |
 47 | | 2023  |    arXiv    | [Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models](https://arxiv.org/pdf/2308.16777)                                                                                                                    |                            [Code](https://github.com/kodenii/Ref-Diff), -                             |
 48 | | 2024  |    CVPR     | [UniGS: Unified Representation for Image Generation and Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Qi_UniGS_Unified_Representation_for_Image_Generation_and_Segmentation_CVPR_2024_paper.pdf)         |                                                 -, -                                                  |
 49 | | 2023  |    ICCV     | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf)              |           [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/)            |
 50 | | 2023  |    ICCV     | [ LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://koutilya-pnvr.github.io/LD-ZNet/), [Project](https://koutilya-pnvr.github.io/LD-ZNet/) |
 51 | | 2023  |    IROS     | [Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10341402)                                                            |                                                 -, -                                                  |
 52 | 
 53 | 
 54 | ### 4.2.3 LLMs/MLLMs-based Solution
 55 | | Year | Publication | Paper Title                                                                                                                                                                                                                            |                                          Project                                          |
 56 | |------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------:|
 57 | | 2023 |    arXiv    | [LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model](https://arxiv.org/pdf/2312.17240)                                                                                                                  |                                           -, -                                            |
 58 | | 2023 |    arXiv    | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498)                                                                                                                                              | [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/) |
 59 | | 2024 |    arXiv    | [LaSagnA: Language-based Segmentation Assistant for Complex Queries](https://arxiv.org/pdf/2404.08506)                                                                                                                                 |                       [Code](https://github.com/congvvc/LaSagnA), -                       | 
 60 | | 2024 |    arXiv    | [Empowering Segmentation Ability to Multi-modal Large Language Models](https://arxiv.org/pdf/2403.14141)                                                                                                                               |                                           -, -                                            |
 61 | | 2024 |    CVPR     | [LISA: Reasoning Segmentation via Large Language Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Lai_LISA_Reasoning_Segmentation_via_Large_Language_Model_CVPR_2024_paper.pdf)                                            |                     [Code](https://github.com/dvlab-research/LISA), -                     |
 62 | | 2024 |    CVPR     | [PixelLM:Pixel Reasoning with Large Multimodal Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Ren_PixelLM_Pixel_Reasoning_with_Large_Multimodal_Model_CVPR_2024_paper.pdf)                                               |   [Code](https://github.com/MaverickRen/PixelLM), [Project](https://pixellm.github.io/)   |
 63 | | 2024 |    CVPR     | [GSVA: Generalized Segmentation via Multimodal Large Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_GSVA_Generalized_Segmentation_via_Multimodal_Large_Language_Models_CVPR_2024_paper.pdf)                |                       [Code](https://github.com/LeapLabTHU/GSVA), -                       |
 64 | | 2024 |    CVPR     | [Osprey: Pixel Understanding with Visual Instruction Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Osprey_Pixel_Understanding_with_Visual_Instruction_Tuning_CVPR_2024_paper.pdf)                                 |                     [Code](https://github.com/CircleRadon/Osprey), -                      |              
 65 | | 2024 |    CVPRW    | [LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning](https://openaccess.thecvf.com/content/CVPR2024W/MMFM/papers/Wang_LLM-Seg_Bridging_Image_Segmentation_and_Large_Language_Model_Reasoning_CVPRW_2024_paper.pdf) |                      [Code](https://github.com/wangjunchi/LLMSeg), -                      |
 66 | 
 67 | 
 68 | ### 4.2.4 Composition of FMs
 69 | | Year  | Publication | Paper Title                                                                                                                                                                                                                                         |                                                 Project                                                 |
 70 | |-------|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:|
 71 | | 2022  |    arXiv    | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725)                                                                                                                                                         |                                                  -, -                                                   |
 72 | | 2022  |    CVPR     | [LAVT: Language-Aware Vision Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_LAVT_Language-Aware_Vision_Transformer_for_Referring_Image_Segmentation_CVPR_2022_paper.pdf)                  |                                                  -, -                                                   |
 73 | | 2022  |   NeurIPS   | [CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation](https://proceedings.neurips.cc/paper_files/paper/2022/file/5e773d319e310f1e4d695159484143b8-Paper-Conference.pdf)                                   |                                                  -, -                                                   |
 74 | | 2023  |    AAAI     | [ Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation](https://ojs.aaai.org/index.php/AAAI/article/download/25428/25200)                                                                                           |                                                  -, -                                                   |
 75 | | 2023  |   ACM MM    | [CARIS: Context-Aware Referring Image Segmentation](https://web.archive.org/web/20231028201140id_/https://dl.acm.org/doi/pdf/10.1145/3581783.3612117)                                                                                               |                               [Code](https://github.com/lsa1997/CARIS), -                               |                        |
 76 | | 2023  |    arXiv    | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498)                                                                                                                                                           |        [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/)        |
 77 | | 2023  |    arXiv    | [Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration](https://arxiv.org/pdf/2305.12799)                                                                                                                        |                    [Code](https://github.com/Yuqifan1117/Labal-Anything-Pipeline), -                    |
 78 | | 2023  |    CVPR     | [GRES: Generalized Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_GRES_Generalized_Referring_Expression_Segmentation_CVPR_2023_paper.pdf)                                                             |       [Code](https://github.com/henghuiding/ReLA), [Project](https://henghuiding.github.io/GRES/)       |
 79 | | 2023  |    CVPR     | [PolyFormer: Referring Image Segmentation as Sequential Polygon Generation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_PolyFormer_Referring_Image_Segmentation_As_Sequential_Polygon_Generation_CVPR_2023_paper.pdf)                 | [Code](https://github.com/amazon-science/polygon-transformer), [Project](https://polyformer.github.io/) |
 80 | | 2023  |    ICCV     | [Beyond One-to-One: Rethinking the Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_Beyond_One-to-One_Rethinking_the_Referring_Image_Segmentation_ICCV_2023_paper.pdf)                                        |                            [Code](https://github.com/toggle1995/RIS-DMMI), -                            |
 81 | | 2023  |    ICCV     | [Shatter and Gather:Learning Referring Image Segmentation with Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Kim_Shatter_and_Gather_Learning_Referring_Image_Segmentation_with_Text_Supervision_ICCV_2023_paper.pdf)      |           [Code](https://github.com/kdwonn/SaG), [Project](https://southflame.github.io/sag/)           |
 82 | | 2023  |    ICCV     | [Segment Every Reference Object in Spatial and Temporal Spaces](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Segment_Every_Reference_Object_in_Spatial_and_Temporal_Spaces_ICCV_2023_paper.pdf)                                         |                          [Code](https://github.com/FoundationVision/UniRef), -                          |
 83 | | 2023  |    TOMM     | [Towards Complex-query Referring Image Segmentation: A Novel Benchmark](https://dl.acm.org/doi/pdf/10.1145/3701733)                                                                                                                                 |                              [Code](https://github.com/lili0415/DuMoGa), -                              |
 84 | | 2024  |   ACM MM    | [Deep Instruction Tuning for Segment Anything Model](https://dl.acm.org/doi/pdf/10.1145/3664647.3680571)                                                                                                                                            |                               [Code](https://github.com/wysnzzzz/DIT), -                                |
 85 | | 2024  |    arXiv    | [LLMBind: A Unified Modality-Task Integration Framework](https://arxiv.org/pdf/2402.14891)                                                                                                                                                          |                           [Code](https://github.com/PKU-YuanGroup/LLMBind), -                           |
 86 | | 2024  |    arXiv    | [Training-Free Semantic Segmentation via LLM-Supervision](https://arxiv.org/pdf/2404.00701)                                                                                                                                                         |                                                  -, -                                                   |
 87 | | 2024  |    arXiv    | [F-LMM: Grounding Frozen Large Multimodal Models](https://arxiv.org/pdf/2406.05821)                                                                                                                                                                 |                               [Code](https://github.com/wusize/F-LMM), -                                |
 88 | | 2024  |    CVPR     | [Mask Grounding for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chng_Mask_Grounding_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf)                                                                   |                           [Code](https://github.com/yxchng/mask-grounding), -                           | 
 89 | | 2024  |    CVPR     | [LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shah_LQMFormer_Language-aware_Query_Mask_Transformer_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf) |                                                  - ,-                                                   |
 90 | | 2024  |    CVPR     | [PerceptionGPT: Effectively Fusing Visual Perception into LLM](https://openaccess.thecvf.com/content/CVPR2024/papers/Pi_PerceptionGPT_Effectively_Fusing_Visual_Perception_into_LLM_CVPR_2024_paper.pdf)                                            |                          [Code](https://github.com/pipilurj/perceptionGPT), -                           |
 91 | | 2024  |    CVPR     | [Prompt-Driven Referring Image Segmentation with Instance Contrasting](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Prompt-Driven_Referring_Image_Segmentation_with_Instance_Contrasting_CVPR_2024_paper.pdf)                        |                                                  -, -                                                   |
 92 | | 2024  |    CVPR     | [Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Curriculum_Point_Prompting_for_Weakly-Supervised_Referring_Image_Segmentation_CVPR_2024_paper.pdf)        |                                                  -, -                                                   |
 93 | 
 94 | ## 4.3 Few-shot Segmentation
 95 | ### 4.3.1 CLIP-based Solution
 96 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                                      |                          Project                           |
 97 | |------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:|
 98 | | 2023 |    arXiv    | [PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning](https://arxiv.org/pdf/2308.12757)                                                                                                                                                                                           |                           -, -                             |
 99 | | 2023 |    CVPR     | [ WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Jeong_WinCLIP_Zero-Few-Shot_Anomaly_Classification_and_Segmentation_CVPR_2023_paper.pdf)                                                                                |      [Code](https://github.com/caoyunkang/WinClip), -      |
100 | | 2024 |    arXiv    | [Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation](https://arxiv.org/pdf/2405.13686)                                                                                                                                                                           |                            -, -                            |
101 | | 2024 |    CVPR     | [Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Rethinking_Prior_Information_Generation_with_CLIP_for_Few-Shot_Segmentation_CVPR_2024_paper.pdf)                                                        |       [Code](https://github.com/vangjin/PI-CLIP), -        |
102 | | 2024 |    CVPR     | [Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Unlocking_the_Potential_of_Pre-trained_Vision_Transformers_for_Few-Shot_Semantic_CVPR_2024_paper.pdf) | [Code](https://github.com/ZiqinZhou66/FewSegwithRD.git), - |
103 | | 2024 |   ICASSP    | [Language-Guided Few-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447456)                                                                                                                                                                              |                            -, -                            |
104 | | 2024 |   ICASSP    | [Weakly Supervised Few-Shot Segmentation Through Textual Prompt](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446831)                                                                                                                                                              |   [Code](https://github.com/Joseph-Lee-V/Text-WS-FSS), -   |
105 | | 2024 |     TMM     | [Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10418263)                                                                                                                                          |                            -, -                            |
106 | 
107 | ### 4.3.2 DM-based Solution
108 | | Year | Publication | Paper Title                                                                                                                                                               |                       Project                       |
109 | |------|:-----------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------:|
110 | | 2023 |    arXiv    | [DifFSS: Diffusion Model for Few-Shot Semantic Segmentation](https://arxiv.org/pdf/2307.00773)                                                                            | [Code](https://github.com/TrinitialChan/DifFSS), -  |
111 | | 2024 |    AAAI     | [MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation](https://ojs.aaai.org/index.php/AAAI/article/view/28068/28143) | [Code](https://github.com/minhquanlecs/MaskDiff), - |
112 | | 2024 |    arXiv    | [SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging](https://arxiv.org/pdf/2403.16578)                                        |                        -, -                         |
113 | | 2024 |     TCE     | [Few-Shot Semantic Segmentation for Consumer Electronics: An Inter-Class Relation Mining Approach](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10460319)     |                        -, -                         |
114 | 
115 | ### 4.3.3 DINO-based Solution
116 | | Year  |       Publication        | Paper Title                                                                                                                                                                                                                                                                                             |                                                Project                                                |
117 | |-------|:------------------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:|
118 | | 2023  |           ICCV           | [Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Kang_Distilling_Self-Supervised_Vision_Transformers_for_Weakly-Supervised_Few-Shot_Classification__Segmentation_CVPR_2023_paper.pdf) |                             [Code](https://github.com/dahyun-kang/cst), -                             |
119 | | 2023  | NeurIPS R0-FoMo Workshop | [One-shot Localization and Segmentation of Medical Images with Foundation Models](https://arxiv.org/pdf/2310.18642)                                                                                                                                                                                     |                                                 -, -                                                  |
120 | | 2024  |          arXiv           | [A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models](https://arxiv.org/pdf/2401.11311)                                                                                                                                                                                |                    [Code](https://github.com/RedaBensaidDS/Foundation_FewShot), -                     |
121 | | 2024  |           ICRA           | [Few-Shot Panoptic Segmentation With Foundation Models](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611624)                                                                                                                                                                              | [Code](https://github.com/robot-learning-freiburg/SPINO), [Project](http://spino.cs.uni-freiburg.de/) |
122 | 
123 | ### 4.3.4 SAM-based Solution
124 | | Year  | Publication | Paper Title                                                                                                                                                                                                                         |                          Project                           |
125 | |-------|:-----------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:|
126 | | 2024  |    arXiv    | [Boosting few shot semantic segmentation via segment anything model](https://arxiv.org/pdf/2401.09826)                                                                                                                              |                            -, -                            |
127 | | 2024  |    arXiv    | [Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation](https://arxiv.org/pdf/2403.05433)                                                                                                                |        [Code](https://github.com/Zch0414/p2sam), -         |
128 | | 2024  |    CVPR     | [VRP-SAM: SAMwithVisual Reference Prompt](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_VRP-SAM_SAM_with_Visual_Reference_Prompt_CVPR_2024_paper.pdf)                                                                   |       [Code](https://github.com/syp2ysy/VRP-SAM), -        |
129 | | 2024  |    CVPR     | [APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/He_APSeg_Auto-Prompt_Network_for_Cross-Domain_Few-Shot_Semantic_Segmentation_CVPR_2024_paper.pdf) |                            -, -                            |
130 | | 2024  |    ICLR     | [Personalize Segment Anything Model with One Shot](https://openreview.net/pdf?id=6Gzkhoc6YS)                                                                                                                                        | [Code](https://github.com/ZrrSkywalker/Personalize-SAM), - |
131 | 
132 | ### 4.3.5 MLLMs-based Solution
133 | | Year  | Publication | Paper Title                                                                                                                                                                                              |                     Project                     |
134 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------:|
135 | | 2023  |    arXiv    | [Few-Shot Classification & Segmentation Using Large Language Models Agent](https://arxiv.org/pdf/2311.12065)                                                                                             |                      -, -                       |
136 | | 2024  |    CVPR     | [Llafs: When large language models meet few-shot segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_LLaFS_When_Large_Language_Models_Meet_Few-Shot_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/lanyunzhu99/LLaFS), - |
137 | 
138 | ### 4.3.6 In-Context Segmentation
139 | | Year  | Publication | Paper Title                                                                                                                                                                                                                    |                                                              Project                                                              |
140 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------:|
141 | | 2022  |   NeurIPS   | [Visual Prompting via Image Inpainting](https://proceedings.neurips.cc/paper_files/paper/2022/file/9f09f316a3eaf59d9ced5ffaefe97e0f-Paper-Conference.pdf)                                                                      |         [Code](https://github.com/amirbar/visual_prompting), [Project](https://yossigandelsman.github.io/visual_prompt/)          |
142 | | 2023  |    arXiv    | [Exploring Effective Factors for Improving Visual In-Context Learning](https://arxiv.org/pdf/2304.04748)                                                                                                                       |                                         [Code](https://github.com/syp2ysy/prompt-SelF), -                                         |
143 | | 2023  |    CVPR     | [Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Images_Speak_in_Images_A_Generalist_Painter_for_In-Context_Visual_CVPR_2023_paper.pdf) |                                         [Code](https://github.com/baaivision/Painter), -                                          |
144 | | 2023  |   NeurIPS   | [What Makes Good Examples for Visual In-Context Learning?](https://proceedings.neurips.cc/paper_files/paper/2023/file/398ae57ed4fda79d0781c65c926d667b-Paper-Conference.pdf)                                                   |                               [Code](https://github.com/ZhangYuanhan-AI/visual_prompt_retrieval), -                               |
145 | | 2023  |   NeurIPS   | [In-Context Learning Unlocked for Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/1b3750390ca8b931fb9ca988647940cb-Paper-Conference.pdf)                                                          | [Code](https://github.com/Zhendong-Wang/Prompt-Diffusion), [Project](https://zhendong-wang.github.io/prompt-diffusion.github.io/) |
146 | | 2024  |    CVPR     | [Sequential Modeling Enables Scalable Learning for Large Vision Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Bai_Sequential_Modeling_Enables_Scalable_Learning_for_Large_Vision_Models_CVPR_2024_paper.pdf)   |                                            [Code](https://github.com/ytongbai/LVM), -                                             |
147 | | 2024  |    CVPR     | [Towards More Unified In-context Visual Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Sheng_Towards_More_Unified_In-context_Visual_Understanding_CVPR_2024_paper.pdf)                                   |                                                               -, -                                                                |
148 | | 2024  |    CVPR     | [Tyche: Stochastic In-Context Learning for Medical Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Rakic_Tyche_Stochastic_In-Context_Learning_for_Medical_Image_Segmentation_CVPR_2024_paper.pdf)    |                      [Code](https://github.com/mariannerakic/tyche), [Project](https://tyche.csail.mit.edu/)                      |
149 | | 2024  |    ICLR     | [Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching](https://openreview.net/pdf?id=yzRXdhk2he)                                                                                                         |                                          [Code](https://github.com/aim-uofa/Matcher), -                                           |
150 | | 2024  |    WACV     | [Instruct Me More! Random Prompting for Visual In-Context Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Instruct_Me_More_Random_Prompting_for_Visual_In-Context_Learning_WACV_2024_paper.pdf)          |                                           [Code](https://github.com/Jackieam/InMeMo), -                                           |
151 | 


--------------------------------------------------------------------------------
/3-GIS.md:
--------------------------------------------------------------------------------
  1 | # 3. Foundation Model based GIS
  2 | 
  3 | ## 3.1 Semantic Segmentation
  4 | ### 3.1.1 CLIP-based Solution
  5 | | Year  | Publication | Paper Title                                                                                                                                                                                                                                                              |                                                    Project                                                     |
  6 | |:-----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------:|
  7 | | 2022  |    BMVC     | [Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models](https://bmvc2022.mpi-inf.mpg.de/0045.pdf)                                                                                                                                                     |  [Code](https://github.com/chaofanma/Fusioner), [Project](https://yyh-rain-song.github.io/Fusioner_webpage/)   |
  8 | | 2022  |    CVPR     | [DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting](https://openaccess.thecvf.com/content/CVPR2022/papers/Rao_DenseCLIP_Language-Guided_Dense_Prediction_With_Context-Aware_Prompting_CVPR_2022_paper.pdf)                                        |        [Code](https://github.com/raoyongming/DenseCLIP), [Project](https://denseclip.ivg-research.xyz/)        |
  9 | | 2022  |    CVPR     | [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_GroupViT_Semantic_Segmentation_Emerges_From_Text_Supervision_CVPR_2022_paper.pdf)                                                               |              [Code](https://github.com/NVlabs/GroupViT), [Project](https://jerryxu.net/GroupViT/)              |
 10 | | 2022  |    CVPR     | [Decoupling Zero-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Ding_Decoupling_Zero-Shot_Semantic_Segmentation_CVPR_2022_paper.pdf)                                                                                                  |                             [Code](https://github.com/dingjiansw101/ZegFormer), -                              |
 11 | | 2022  |    ECCV     | [A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890725.pdf)                                                                                             |                             [Code](https://github.com/MendelXu/zsseg.baseline), -                              |
 12 | | 2022  |    ECCV     | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf)                                                                                                                                                            |    [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/)     |
 13 | | 2022  |    ICLR     | [Language-driven Semantic Segmentation](https://openreview.net/pdf?id=RriDjddCLN)                                                                                                                                                                                        |                                 [Code](https://github.com/isl-org/lang-seg), -                                 |                                
 14 | | 2023  |    arXiv    | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653)                                                                                                                                                       |                              [Code](https://github.com/xmed-lab/CLIP_Surgery), -                               |
 15 | | 2023  |    arXiv    | [ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts](https://arxiv.org/pdf/2301.12171)                                                                                                                                                              |      [Code](https://github.com/cubeyoung/ZegOT), [Project](https://cubeyoung.github.io/zegot.github.io/)       |
 16 | | 2023  |    arXiv    | [CLIP is Also a Good Teacher: A New Training Framework for Inductive Zero-shot Semantic Segmentation](https://arxiv.org/pdf/2310.02296)                                                                                                                                  |                                                      -, -                                                      |
 17 | | 2023  |    arXiv    | [Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation](https://arxiv.org/pdf/2304.01114)                                                                                                                                                 |                                                      -, -                                                      |
 18 | | 2023  |    arXiv    | [TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification](https://arxiv.org/pdf/2312.14149)                                                                                                                                                          |      [Code](https://github.com/Qinying-Liu/TagAlign), [Project](https://qinying-liu.github.io/Tag-Align/)      |
 19 | | 2023  |    arXiv    | [CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2312.12359)                                                                                                                                             | [Code](https://github.com/wysoczanska/clip_dinoiser), [Project](https://wysoczanska.github.io/CLIP_DINOiser/)  |
 20 | | 2023  |    CVPR     | [Probabilistic Prompt Learning for Dense Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Kwon_Probabilistic_Prompt_Learning_for_Dense_Prediction_CVPR_2023_paper.pdf)                                                                                  |                                                      -, -                                                      |
 21 | | 2023  |    CVPR     | [ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_ZegCLIP_Towards_Adapting_CLIP_for_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                    |                               [Code](https://github.com/ZiqinZhou66/ZegCLIP), -                                |
 22 | | 2023  |    CVPR     | [ Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Mukhoti_Open_Vocabulary_Semantic_Segmentation_With_Patch_Aligned_Contrastive_Learning_CVPR_2023_paper.pdf)                        |                                                      -, -                                                      | 
 23 | | 2023  |    CVPR     | [ Side Adapter Network for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Xu_Side_Adapter_Network_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                           |              [Code](https://github.com/MendelXu/SAN), [Project](https://mendelxu.github.io/SAN/)               |
 24 | | 2023  |    CVPR     | [ CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_CLIP_Is_Also_an_Efficient_Segmenter_A_Text-Driven_Approach_for_CVPR_2023_paper.pdf)                 |                                [Code](https://github.com/linyq2117/CLIP-ES), -                                 |
 25 | | 2023  |    CVPR     | [Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP](https://openaccess.thecvf.com/content/CVPR2023/papers/Liang_Open-Vocabulary_Semantic_Segmentation_With_Mask-Adapted_CLIP_CVPR_2023_paper.pdf)                                                             |  [Code](https://github.com/facebookresearch/ov-seg), [Project](https://jeff-liangf.github.io/projects/ovseg/)  |
 26 | | 2023  |    CVPR     | [CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/He_CLIP-S4_Language-Guided_Self-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                             |                                                      -, -                                                      |
 27 | | 2023  |    CVPR     | [Delving into Shape-aware Zero-shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_Delving_Into_Shape-Aware_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                                       |                                  [Code](https://github.com/Liuxinyv/SAZS), -                                   |
 28 | | 2023  |    CVPR     | [A Simple Framework for Text-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Yi_A_Simple_Framework_for_Text-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                                |                                 [Code](https://github.com/muyangyi/SimSeg), -                                  |
 29 | | 2023  |    ICCV     | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf)                                                             |                                 [Code](https://github.com/kahnchana/clippy), -                                 | 
 30 | | 2023  |    ICCV     | [Global Knowledge Calibration for Fast Open-Vocabulary Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Global_Knowledge_Calibration_for_Fast_Open-Vocabulary_Segmentation_ICCV_2023_paper.pdf)                                                   |                                  [Code](https://github.com/yongliu20/GKC), -                                   | 
 31 | | 2023  |    ICCV     | [Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Open-Vocabulary_Semantic_Segmentation_with_Decoupled_One-Pass_Network_ICCV_2023_paper.pdf)                                             |          [Code](https://github.com/CongHan0808/DeOP), [Project](https://conghan0808.github.io/DeOP/)           |
 32 | | 2023  |    ICCV     | [ Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Exploring_Open-Vocabulary_Semantic_Segmentation_from_CLIP_Vision_Encoder_Distillation_Only_ICCV_2023_paper.pdf) |                             [Code](https://github.com/facebookresearch/ZeroSeg), -                             |
 33 | | 2023  |    ICML     | [SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation](https://proceedings.mlr.press/v202/luo23a/luo23a.pdf)                                                                                                                      |                                  [Code](https://github.com/ArrowLuo/SegCLIP),                                  |
 34 | | 2023  |   NeurIPS   | [Learning Mask-aware CLIP Representations for Zero-Shot Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/6ffe484a646db13891bb6435ca39d667-Paper-Conference.pdf)                                                                                  |                                [Code](https://github.com/jiaosiyu1999/MAFT), -                                 |
 35 | | 2023  |   NeurIPS   | [Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/e95eb5206c867be843fbc14bbfe8c10e-Paper-Conference.pdf)                                                                   |                                  [Code](https://github.com/Ferenas/PGSeg), -                                   |
 36 | | 2024  |    arXiv    | [kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies](https://arxiv.org/pdf/2404.09447)                                                                                                                                   |                                                      -, -                                                      |
 37 | | 2024  |    CVPR     | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf)    |                                   [Code](https://github.com/WalBouss/GEM), -                                   |                                
 38 | | 2024  |    CVPR     | [ CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.pdf)                                                 |         [Code](https://github.com/cvlab-kaist/CAT-Seg), [Project](https://ku-cvlab.github.io/CAT-Seg/)         |
 39 | | 2024  |    CVPR     | [CLIP as RNN:Segment Countless Visual Concepts without Training Endeavor](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_CLIP_as_RNN_Segment_Countless_Visual_Concepts_without_Training_Endeavor_CVPR_2024_paper.pdf)                                         |        [Code](https://github.com/kevin-ssy/CLIP_as_RNN), [Project](https://torrvision.com/clip_as_rnn/)        |
 40 | | 2024  |    ECCV     | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18)                                                                                                                                   |                                 [Code](https://github.com/wangf3014/SCLIP), -                                  |
 41 | | 2024  |    ECCV     | [OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-72980-5_12)                                                                                                                         |                                 [Code](https://github.com/cubeyoung/OTSeg), -                                  |
 42 | | 2024  |    ECCV     | [SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance](https://link.springer.com/content/pdf/10.1007/978-3-031-72933-1_15)                                                                                                                        |                             [Code](https://github.com/google-research/semivl/), -                              |
 43 | | 2024  |    ECCV     | [SILC: Improving Vision Language Pretraining with Self-distillation](https://link.springer.com/content/pdf/10.1007/978-3-031-72664-4_3)                                                                                                                                  |
 44 | | 2024  |    ICLR     | [CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction](https://openreview.net/pdf?id=DjzvJCRsVf)                                                                                                                                            |                                 [Code](https://github.com/wusize/CLIPSelf), -                                  |
 45 | | 2024  |    TCSVT    | [Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10764736)                                                                                                                                |                                                      -, -                                                      |
 46 | | 2025  |    WACV     | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181)                                                                                                                                                |                                  [Code](https://github.com/sinahmr/NACLIP), -                                  |
 47 | 
 48 | ### 3.1.2 DM-based Solution
 49 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                                |                                                                       Project                                                                        |
 50 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------:|
 51 | | 2022 |    arXiv    | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org//pdf/2211.13224v2)                                                                                                                                                                                   |                        [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo/)                         |
 52 | | 2022 |    ICLR     | [Label-Efficient Semantic Segmentation with Diffusion Models](https://openreview.net/pdf?id=SlxSY2UZQT)                                                                                                                                                                                    |            [Code](https://github.com/yandex-research/ddpm-segmentation), [Project](https://yandex-research.github.io/ddpm-segmentation/)             |
 53 | | 2022 |    MIDL     | [Diffusion Models for Implicit Image Segmentation Ensembles](https://proceedings.mlr.press/v172/wolleb22a/wolleb22a.pdf)                                                                                                                                                                   |                                        [Code](https://github.com/JuliaWolleb/Diffusion-based-Segmentation), -                                        |
 54 | | 2022 |   SASHIMI   | [Can Segmentation Models Be Trained with Fully Synthetically Generated Data?](https://link.springer.com/content/pdf/10.1007/978-3-031-16980-9_8.pdf)                                                                                                                                       |                                                                         -, -                                                                         |
 55 | | 2023 |     ACL     | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf)                                                                                                                                                                        |                                                     [Code](https://github.com/castorini/daam), -                                                     |
 56 | | 2023 |    arXiv    | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773)                                                                                                                                                                         |                   [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/)                    |
 57 | | 2023 |    arXiv    | [Harnessing Diffusion Models for Visual Perception with Meta Prompts](https://arxiv.org/pdf/2312.14733)                                                                                                                                                                                    |                                                 [Code](https://github.com/fudan-zvg/meta-prompts), -                                                 |
 58 | | 2023 |    arXiv    | [EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model](https://arxiv.org/pdf/2310.12868)                                                                                                                                                                        |                                                 [Code](https://github.com/NUBagciLab/DiffBoost), -                                                   | 
 59 | | 2023 |    CVPR     | [Ambiguous Medical Image Segmentation using Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/papers/Rahman_Ambiguous_Medical_Image_Segmentation_Using_Diffusion_Models_CVPR_2023_paper.pdf)                                                                                | [Code](https://github.com/aimansnigdha/ambiguous-medical-image-segmentation-using-diffusion-models), [Project](https://aimansnigdha.github.io/cimd/) |
 60 | | 2023 |    ICCV     | [DDP: Diffusion Model for Dense Visual Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_DDP_Diffusion_Model_for_Dense_Visual_Prediction_ICCV_2023_paper.pdf)                                                                                                           |                               [Code](https://github.com/JiYuanFeng/DDP), [Project](https://github.com/JiYuanFeng/DDP)                                |
 61 | | 2023 |    ICCV     | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf)                                                                          |                                   [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/)                                   |
 62 | | 2023 |    ICCV     | [LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](http://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf)                                                              |                        [Code](https://github.com/koutilya-pnvr/LD-ZNet), [Project](https://koutilya-pnvr.github.io/LD-ZNet/)                         |
 63 | | 2023 |    ICCV     | [Stochastic Segmentation with Conditional Categorical Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Zbinden_Stochastic_Segmentation_with_Conditional_Categorical_Diffusion_Models_ICCV_2023_paper.pdf)                                                           |                                       [Code](https://github.com/LarsDoorenbos/ccdm-stochastic-segmentation), -                                       |
 64 | | 2023 |    ICCV     | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.pdf)    |                         [Code](https://github.com/weijiawu/DiffuMask), [Project](https://weijiawu.github.io/DiffusionMask/)                          |
 65 | | 2023 |   MICCAI    | [Diffusion-Based Data Augmentation for Nuclei Image Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-43993-3_57.pdf?pdf=inline%20link)                                                                                                                                |                                                     [Code](https://github.com/xinyiyu/Nudiff), -                                                     |
 66 | | 2023 |   NeurIPS   | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf)                                                                  |                                            [Code](https://github.com/VinAIResearch/Dataset-Diffusion), -                                             |
 67 | | 2023 |   NeurIPS   | [Unsupervised Semantic Correspondence Using Stable Diffusion](https://proceedings.neurips.cc/paper_files/paper/2023/file/1a074a28c3a6f2056562d00649ae6416-Paper-Conference.pdf)                                                                                                            |               [Code](https://github.com/ubc-vision/LDM_correspondences), [Project](https://ubc-vision.github.io/LDM_correspondences/)                |
 68 | | 2023 |   NeurIPS   | [SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process](https://papers.nips.cc/paper_files/paper/2023/file/fc0cc55dca3d791c4a0bb2d8ddeefe4f-Paper-Conference.pdf)                                                                                     |                                                [Code](https://github.com/MengyuWang826/SegRefiner), -                                                |
 69 | | 2023 |   NeurIPS   | [DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/ab6e7ad2354f350b451b5a8e14d04f51-Paper-Conference.pdf)                                                                                         |                         [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/)                          |
 70 | | 2024 |    arXiv    | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105)                                                                                                                                                                         |                                                 -, [Project](https://bcorrad.github.io/freesegdiff/)                                                 |
 71 | | 2024 |    arXiv    | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194)                                                                                                                                                                       |                                               [Code](https://github.com/Valkyrja3607/MaskDiffusion), -                                               |
 72 | | 2024 |    CVPR     | [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Barsellotti_Training-Free_Open-Vocabulary_Segmentation_with_Offline_Diffusion-Augmented_Prototype_Generation_CVPR_2024_paper.pdf) |                             [Code](https://github.com/aimagelab/freeda), [Project](https://aimagelab.github.io/freeda/)                              |
 73 | | 2024 |    CVPR     | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf)       |                                                      [Code](https://github.com/vpulab/ovam), -                                                       |
 74 | | 2024 |    CVPR     | [Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Diffuse_Attend_and_Segment_Unsupervised_Zero-Shot_Segmentation_using_Stable_Diffusion_CVPR_2024_paper.pdf)                             |                           [Code](https://github.com/google/diffseg), [Project](https://sites.google.com/view/diffseg/home)                           |
 75 | | 2024 |    CVPR     | [Text-image Alignment for Diffusion-based Perception](https://openaccess.thecvf.com/content/CVPR2024/papers/Kondapaneni_Text-Image_Alignment_for_Diffusion-Based_Perception_CVPR_2024_paper.pdf)                                                                                           |                               [Code](https://github.com/damaggu/TADP), [Project](https://www.vision.caltech.edu/tadp/)                               |
 76 | | 2024 |    CVPRW    | [ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation](http://mengtang.org/scribblegen_cvprw2024.pdf)                                                                                                                                              |                                                [Code](https://github.com/mengtang-lab/scribblegen), -                                                |
 77 | | 2024 |    ECCV     | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf)                                                                                                                                                                    |                                           -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/)                                            |
 78 | | 2024 |    ECCV     | [Do Text-Free Diffusion Models Learn Discriminative Visual Representations?](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07757-supp.pdf)                                                                                                                                      |                                                  [Code](https://github.com/soumik-kanad/diffssl), -                                                  |
 79 | | 2024 |    IJCAI    | [Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors](https://www.ijcai.org/proceedings/2024/0082.pdf)                                                                                                                                       |                                                                         -, -                                                                         |
 80 | | 2024 |    IJCV     | [MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation](https://arxiv.org//pdf/2309.13042.pdf)                                                                                                                                                      |                                                 [Code](https://github.com/Jiahao000/MosaicFusion), -                                                 |
 81 | 
 82 | 
 83 | ### 3.1.3 DINO-based Solution
 84 | | Year | Publication | Paper Title                                                                                                                                                                                                                                                                             |                                                               Project                                                               |
 85 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:|
 86 | | 2021 |    arXiv    | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf)                                                                                                                                                                                          |               [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/)                |
 87 | | 2021 |    BMVC     | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf)                                                                                                                                                 |                                             [Code](https://github.com/valeoai/LOST), -                                              |
 88 | | 2022 |    CVPR     | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) |
 89 | | 2022 |    CVPR     | [Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut](http://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Self-Supervised_Transformers_for_Unsupervised_Object_Discovery_Using_Normalized_Cut_CVPR_2022_paper.pdf)                                |               [Code](https://github.com/YangtaoWANG95/TokenCut), [Project](https://www.m-psi.fr/Papers/TokenCut2022/)               |
 90 | | 2022 |    CVPR     | [Self-Supervised Learning of Object Parts for Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2022/papers/Ziegler_Self-Supervised_Learning_of_Object_Parts_for_Semantic_Segmentation_CVPR_2022_paper.pdf)                                                               |                                         [Code](https://github.com/MkuuWaUjinga/leopart), -                                          |
 91 | | 2022 |    ICLR     | [STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414)                                                                                                                                                                     |                     [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html)                      |
 92 | | 2023 |    CVPR     | [Leveraging Hidden Positives for Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Seong_Leveraging_Hidden_Positives_for_Unsupervised_Semantic_Segmentation_CVPR_2023_paper.pdf)                                                                |                                               [Code](https://github.com/hynnsk/HP), -                                               | 
 93 | | 2023 |    ICLR     | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC)                                                                                                                                                     |                                           [Code](https://github.com/zadaianchuk/comus), -                                           |
 94 | | 2023 |    TPAMI    | [TokenCut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10224285)                                                                                                              |            [Code](https://github.com/YangtaoWANG95/TokenCut_video), [Project](https://www.m-psi.fr/Papers/TokenCut2022/)            |
 95 | | 2024 |    CVPR     | [ Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Sick_Unsupervised_Semantic_Segmentation_Through_Depth-Guided_Feature_Correlation_and_Sampling_CVPR_2024_paper.pdf)                    |                      [Code](https://github.com/leonsick/depthg), [Project](https://leonsick.github.io/depthg/)                      |
 96 | | 2024 |    CVPR     | [ EAGLE:Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_EAGLE_Eigen_Aggregation_Learning_for_Object-Centric_Unsupervised_Semantic_Segmentation_CVPR_2024_paper.pdf)                         |                                            [Code](https://github.com/dnwjddl/EAGLE), -                                              |
 97 | 
 98 | ### 3.1.4 SAM-based Solution
 99 | | Year | Publication | Paper Title                                                                                                                                                                                                                                   |                    Project                    |
100 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------:|
101 | | 2023 |    arXiv    | [Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation](https://arxiv.org/pdf/2305.05803)                                                                                                           |                     -, -                      |
102 | | 2023 |    arXiv    | [Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models](https://arxiv.org//pdf/2310.13026.pdf)                                                                                        |                     -, -                      |
103 | | 2024 |    CVPR     | [ From SAMtoCAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kweon_From_SAM_to_CAMs_Exploring_Segment_Anything_Model_for_Weakly_CVPR_2024_paper.pdf) | [Code](https://github.com/sangrockEG/S2C), -  |
104 | 
105 | ### 3.1.5 Composition of FMs
106 | | Year  | Publication | Paper Title                                                                                                                                                                                                                                                                      |                                               Project                                                |
107 | |:-----:|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------:|
108 | | 2023  |    ICCV     | [Zero-guidance Segmentation Using Zero Segment Labels](https://openaccess.thecvf.com/content/ICCV2023/papers/Rewatbowornwong_Zero-guidance_Segmentation_Using_Zero_Segment_Labels_ICCV_2023_paper.pdf)                                                                           | [Code](https://github.com/nessessence/ZeroGuidanceSeg), [Project](https://zero-guide-seg.github.io/) |
109 | | 2023  |   MICCAI    | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://arxiv.org/pdf/2308.07156)                                                                                                                                                   |                                                 -, -                                                 |
110 | | 2024  |    arXiv    | [TAG: Guidance-free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2403.11197)                                                                                                                                                                                     |                            [Code](https://github.com/Valkyrja3607/TAG), -                            |
111 | | 2024  |    arXiv    | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105)                                                                                                                                                               |  [Code](https://bcorrad.github.io/freesegdiff/#), [Project](https://bcorrad.github.io/freesegdiff/)  |
112 | | 2024  |    CVPR     | [Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Image-to-Image_Matching_via_Foundation_Models_A_New_Perspective_for_Open-Vocabulary_CVPR_2024_paper.pdf) |                                                 -, -                                                 |
113 | 
114 | ## 3.2 Instance Segmentation
115 | ### 3.2.1 CLIP-based Solution
116 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
117 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
118 | | 2023  |       ICML      | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729)                                                                                                                                                   |                                               [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/)                                                |
119 | | 2023  |       ICCV        | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html)                                                                                                                                                    |                          [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/)                          |
120 | | 2023  |       NeurIps        | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/bytedance/fc-clip), -                  |
121 | | 2023  |       ICCV        | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/)                  |
122 | | 2023  |       CVPR       | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/)                 |
123 | | 2023  |       CVPR       | [Semantic-promoted debiasing and background disambiguation for zero-shot instance segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Semantic-Promoted_Debiasing_and_Background_Disambiguation_for_Zero-Shot_Instance_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/heshuting555/D2Zero), [Project](https://henghuiding.github.io/D2Zero/)                  |
124 | 
125 | ### 3.2.2 DM-based Solution
126 | | Year  | Publication | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
127 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
128 | | 2023  |    arXiv    | [Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation](https://link.springer.com/article/10.1007/s11263-024-02223-3)                                                                                                                                  |                 [Code](https://github.com/Jiahao000/MosaicFusion), -                  |
129 | | 2022  |    arXiv    | [Dalle for detection: Language-driven compositional image synthesis for object detection](https://arxiv.org/abs/2206.09592)                                                                                                                                  |                 [Code](https://github.com/gyhandy/Text2Image-for-Detection), -                  |
130 | | 2023  |   NeurIps   | [Datasetdm: Synthesizing data with perception annotations using diffusion models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/ab6e7ad2354f350b451b5a8e14d04f51-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/)                  |
131 | 
132 | 
133 | ### 3.2.3 DINO-based Solution
134 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
135 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
136 | | 2022  |       arXiv       | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363)                                                                                                                                  |                 [Code](https://github.com/wvangansbeke/MaskDistill), -                  |
137 | | 2023  |       CVPR      | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/)                  |
138 | | 2023  |       NeurIps       | [HASSOD: Hierarchical Adaptive Self-Supervised Object Detection](https://proceedings.neurips.cc/paper_files/paper/2023/hash/b9ecf4d84999a61783c360c3782e801e-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/Shengcao-Cao/HASSOD), [Project](https://hassod-neurips23.github.io/)                  |
139 | | 2024  |       CVPR       | [CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers](https://openaccess.thecvf.com/content/CVPR2024/html/Arica_CuVLER_Enhanced_Unsupervised_Object_Discoveries_through_Exhaustive_Self-Supervised_Transformers_CVPR_2024_paper.html)                                                                                                                                  |                 [Code](https://github.com/shahaf-arica/CuVLER), -                  |
140 | 
141 | ### 3.2.4 Composition of FMs for Instance Segmentation
142 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
143 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
144 | | 2023  |       ICML       | [X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion](https://proceedings.mlr.press/v202/zhao23f.html)                                                                                                                                  |                 [Code](https://github.com/yoctta/XPaste), -                  |
145 | | 2024  |       CVPR       | [DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data](https://openaccess.thecvf.com/content/CVPR2024/html/Fan_DiverGen_Improving_Instance_Segmentation_by_Learning_Wider_Data_Distribution_with_CVPR_2024_paper.html)                                                                                                                                  |                 [Code](https://github.com/aim-uofa/DiverGen), -                  |
146 | | 2024  |       ICLR       | [The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models](https://arxiv.org/abs/2404.11957)                                                                                                                                  |                 [Code](https://github.com/chengshiest/zip-your-clip), -                  |
147 | | 2023  |       NeurIPS       | [Segment anything in high quality](https://proceedings.neurips.cc/paper_files/paper/2023/hash/5f828e38160f31935cfe9f67503ad17c-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/SysCV/sam-hq), -                  |
148 | | 2024  |        arXiv       | [Grounded sam: Assembling open-world models for diverse visual tasks](https://arxiv.org/abs/2401.14159)                                                                                                                                  |                 [Code](https://github.com/IDEA-Research/Grounded-Segment-Anything), -                  |
149 | 
150 | ## 3.3 Panoptic Segmentation
151 | ### 3.3.1 CLIP-based Solution
152 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
153 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
154 | | 2023  |       ICML      | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729)                                                                                                                                                   |                                               [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/)                                                |
155 | | 2023  |       ICCV        | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html)                                                                                                                                                    |                          [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/)                          |
156 | | 2023  |       NeurIps        | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/bytedance/fc-clip), -                  |
157 | | 2023  |       ICCV        | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/)                  |
158 | | 2023  |       CVPR       | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/)                 |
159 | | 2023  |       CVPR       | [Generalized Decoding for Pixel, Image, and Language](https://openaccess.thecvf.com/content/CVPR2023/html/Zou_Generalized_Decoding_for_Pixel_Image_and_Language_CVPR_2023_paper)                                                                                                                                  |                 [Code](https://github.com/microsoft/X-Decoder), [Project](https://x-decoder-vl.github.io/)                  |
160 | | 2024  |        arXiv       | [Open-vocabulary segmentation with unpaired mask-text super vision](https://arxiv.org/abs/2402.08960)                                                                                                                                  |                 [Code](https://github.com/DerrickWang005/Unpair-Seg.pytorch), -                  |
161 | | 2023  |       CVPR       | [FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Qin_FreeSeg_Unified_Universal_and_Open-Vocabulary_Image_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/bytedance/FreeSeg), -                  |
162 | | 2023  |       NeurIps       | [Dataseg: Taming a universal multi-dataset multi-task segmentation model](https://proceedings.neurips.cc/paper_files/paper/2023/hash/d4eed238cf5807c6b75face996302892-Abstract-Conference.html)                                                                                                                                  |                 -, -                  |
163 | | 2024  |       CVPR       | [OMG-Seg: Is One Model Good Enough For All Segmentation?](https://openaccess.thecvf.com/content/CVPR2024/html/Li_OMG-Seg_Is_One_Model_Good_Enough_For_All_Segmentation_CVPR_2024_paper.html)                                                                                                                                  |                 [Code](https://github.com/lxtgh/omg-seg), -                  |
164 | 
165 | ### 3.3.2 DM-based Solution
166 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
167 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
168 | | 2023  |       CVPR       | [Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/html/Xu_Open-Vocabulary_Panoptic_Segmentation_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/nvlabs/odise), -                  |
169 | | 2023  |     ICCV       | [A Generalist Framework for Panoptic Segmentation of Images and Videos](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_A_Generalist_Framework_for_Panoptic_Segmentation_of_Images_and_Videos_ICCV_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/google-research/pix2seq), -                  |
170 | | 2023  |       ICLR       | [Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning](https://arxiv.org/abs/2208.04202)                                                                                                                                  |                 [Code](https://github.com/google-research/pix2seq), -                  |
171 | | 2023  |       ICCV       | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/html/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/weijiawu/DiffusionMask), -                  |
172 | | 2024  |       ECCV       | [A simple latent diffusion approach for panoptic segmentation and mask inpainting](https://link.springer.com/chapter/10.1007/978-3-031-72633-0_5)                                                                                                                                  |                 [Code](https://github.com/segments-ai/latent-diffusion-segmentation), -                  |
173 | 
174 | ### 3.3.3 DINO-based Solution
175 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
176 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
177 | | 2022  |       ICLR       | [Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414)                                                                                                                                  |                 [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html)                  |
178 | | 2023  |       CVPR      | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html)                                                                                                                                  |                 [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/)                  |
179 | | 2024  |       CVPR      | [Unsupervised Universal Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/html/Niu_Unsupervised_Universal_Image_Segmentation_CVPR_2024_paper.html)                                                                                                                                  |                 [Code](https://github.com/u2seg/U2Seg), [Project](https://u2seg.github.io/)                  |
180 | | 2019  |       CVPR      | [Panoptic feature pyramid networks](https://openaccess.thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.html)                                                                                                                                  |                 [Code](https://github.com/Angzz/panoptic-fpn-gluon), -                  |
181 | | 2024  |       arXiv      | [A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation](https://arxiv.org/abs/2405.19035)                                                                                                                                  |                 [Code](https://github.com/robot-learning-freiburg/PASTEL), -                  |
182 | | 2020  |       CVPR      | [Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation](https://openaccess.thecvf.com/content_CVPR_2020/html/Cheng_Panoptic-DeepLab_A_Simple_Strong_and_Fast_Baseline_for_Bottom-Up_Panoptic_CVPR_2020_paper.html)                                                                                                                                  |                 [Code](https://github.com/bowenc0221/panoptic-deeplab), -                  |
183 | 
184 | ### 3.3.4 SAM-based Solution
185 | | Year  |    Publication     | Paper Title                                                                                                                                                                                                                                              |                                             Project                                              |
186 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:|
187 | | 2024  |       ECCV      | [Semantic-SAM: Segment and Recognize Anything at Any Granularity](https://arxiv.org/abs/2307.04767)                                                                                                                                  |                 [Code](https://github.com/UX-Decoder/Semantic-SAM), -                  |
188 | | 2023  |       NeurIps      | [Segment Everything Everywhere All at Once](https://proceedings.neurips.cc/paper_files/paper/2023/hash/3ef61f7e4afacf9a2c5b71c726172b86-Abstract-Conference.html)                                                                                                                                  |                 [Code](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once), -                  |
189 | | 2014  |       ECCV      | [Microsoft coco: Common objects in context](https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48)                                                                                                                                  |                 -, [Project](https://cocodataset.org/#home)                  |
190 | | 2017  |       CVPR      | [Scene Parsing Through ADE20K Dataset](https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html)                                                                                                                                  |                 -, -                  |
191 | | 2015  |       IJCV      | [The pascal visual object classes challenge: A retrospective](https://link.springer.com/article/10.1007/s11263-014-0733-5)                                                                                                                                  |                 -, -                  |
192 | 


--------------------------------------------------------------------------------