├── tasks.png ├── segmentation emerge.PNG ├── README.md ├── 2-Segmentation Emerge.md ├── 4-PIS.md └── 3-GIS.md /tasks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/tasks.png -------------------------------------------------------------------------------- /segmentation emerge.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/stanley-313/ImageSegFM-Survey/HEAD/segmentation emerge.PNG -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/Naereen/StrapDown.js/graphs/commit-activity) 2 | [![PR's Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) 3 | 4 |

5 |

Image Segmentation in Foundation Model Era: A Survey

6 | 7 |

8 | Tianfei Zhou 9 | , 10 | Wang Xia 11 | , 12 | Fei Zhang 13 | , 14 | Boyu Chang 15 | , 16 | Wenguan Wang 17 | , 18 | Ye Yuan 19 | , 20 | Ender Konukoglu 21 | , 22 | Daniel Cremers 23 | 24 |

25 | 26 |

27 | 28 | arXiv PDF 29 | 30 | 31 | Project Page 32 | 33 |

34 |

35 |
36 | 37 | 38 | This repository complies a collection of resources on image segmentation in foundation model era, 39 | and will be continuously updated to track developments in the field. 40 | Please feel free to submit a pull request if you find any work missing. 41 | 42 | ## 1. Introduction 43 | Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as 44 | evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary 45 | segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image 46 | segmentation or developing dedicated segmentation foundation models (e.g., SAM, SAM2). These approaches not only deliver 47 | superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context. 48 | However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions 49 | associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research 50 | centered around FM-driven image segmentation. We investigate two basic lines of research (as shown in the following figure) – generic image segmentation (i.e., 51 | semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive 52 | segmentation, referring segmentation, few-shot segmentation) – by delineating their respective task settings, background concepts, 53 | and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable 54 | Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current 55 | research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research. 56 | 57 |

58 | 59 |

60 | 61 | *** 62 | 63 | ## 2. Segmentation Knowledge Emerges From FMs 64 | Given the emergency capabilities of LLMs, a natural question arises: *Do segmentation properties emerge from FMs?* The 65 | answer is **positive**, even for FMs not explicitly designed for 66 | segmentation, such as CLIP, DINO and Diffusion Models. This also unlocks a new frontier in image segmentation, 67 | i.e., **acquiring segmentation without any training.** The following figure illustrates how to approach this and shows some examples: 68 | 69 |

70 | 71 |

72 | 73 | - [2.1 Segmentation Emerges from CLIP](2-Segmentation%20Emerge.md#21-segmentation-emerges-from-clip) 74 | - [2.2 Segmentation Emerges from DMs](2-Segmentation%20Emerge.md#22-segmentation-emerges-from-dms) 75 | - [2.3 Segmentation Emerges from DINO](2-Segmentation%20Emerge.md#23-segmentation-emerges-from-dino) 76 | 77 | *** 78 | 79 | ## 3. Foundation Model based GIS 80 | - [3.1 Semantic Segmentation](3-GIS.md#31-semantic-segmentation) 81 | - [3.1.1 CLIP-based Solution](3-GIS.md#311-clip-based-solution) 82 | - [3.1.2 DM-based Solution](3-GIS.md#312-dm-based-solution) 83 | - [3.1.3 DINO-based Solution](3-GIS.md#313-dino-based-solution) 84 | - [3.1.4 SAM-based Solution](3-GIS.md#314-sam-based-solution) 85 | - [3.1.5 Composition of FMs](3-GIS.md#315-composition-of-fms) 86 | - [3.2 Instance Segmentation](3-GIS.md#32-instance-segmentation) 87 | - [3.2.1 CLIP-based Solution](3-GIS.md#321-clip-based-solution) 88 | - [3.2.2 DM-based Solution](3-GIS.md#322-dm-based-solution) 89 | - [3.2.3 DINO-based Solution](3-GIS.md#323-dino-based-solution) 90 | - [3.2.4 Composition of FMs](3-GIS.md#324-composition-of-fms) 91 | - [3.3 Panoptic Segmentation](3-GIS.md#33-panoptic-segmentation) 92 | - [3.3.1 CLIP-based Solution](3-GIS.md#331-clip-based-solution) 93 | - [3.3.2 DM-based Solution](3-GIS.md#332-dm-based-solution) 94 | - [3.3.3 DINO-based Solution](3-GIS.md#333-dino-based-solution) 95 | - [3.3.4 SAM-based Solution](3-GIS.md#334-sam-based-solution) 96 | 97 | 98 | *** 99 | 100 | ## 4. Foundation Model based PIS 101 | - [4.1 Interactive Segmentation](4-PIS.md#41-interactive-segmentation) 102 | - [4.1.1 SAM-based Solution](4-PIS.md#411-sam-based-solution) 103 | - [4.2 Referring Segmentation](4-PIS.md#42-referring-segmentation) 104 | - [4.2.1 CLIP-based Solution](4-PIS.md#421-clip-based-solution) 105 | - [4.2.2 DM-based Solution](4-PIS.md#422-dm-based-solution) 106 | - [4.2.3 LLMs/MLLMs-based Solution](4-PIS.md#423-llmsmllms-based-solution) 107 | - [4.2.4 Composition of FMs](4-PIS.md#424-composition-of-fms) 108 | - [4.3 Few-shot Segmentation](4-PIS.md#43-few-shot-segmentation) 109 | - [4.3.1 CLIP-based Solution](4-PIS.md#431-clip-based-solution) 110 | - [4.3.2 DM-based Solution](4-PIS.md#432-dm-based-solution) 111 | - [4.3.3 DINO-based Solution](4-PIS.md#433-dino-based-solution) 112 | - [4.3.4 SAM-based Solution](4-PIS.md#434-sam-based-solution) 113 | - [4.3.5 LLMs/MLLMs-based Solution](4-PIS.md#435-mllms-based-solution) 114 | - [4.3.6 In-Context Segmentation](4-PIS.md#436-in-context-segmentation) 115 | ## Citation 116 | 117 | If you find our survey and repository useful for your research, please consider citing our paper: 118 | ```bibtex 119 | @article{zhou2024SegFMSurvey 120 | title={Image Segmentation in Foundation Model Era: A Survey}, 121 | author={Zhou, Tianfei and Xia, Wang and Zhang, Fei and Chang, Boyu and Wang, Wenguan and Yuan, Ye and Konukoglu, Ender and Cremers, Daniel}, 122 | journal={arXiv preprint arXiv:2408.12957}, 123 | year={2024}, 124 | } 125 | ``` 126 | -------------------------------------------------------------------------------- /2-Segmentation Emerge.md: -------------------------------------------------------------------------------- 1 | ## 2.1 Segmentation Emerges from CLIP 2 | | Year | Publication | Paper Title | Project | 3 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:| 4 | | 2022 | ECCV | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf) | [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/) | 5 | | 2023 | arXiv | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653) | [Code](https://github.com/xmed-lab/CLIP_Surgery), - | 6 | | 2023 | ICCV | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/kahnchana/clippy), - | 7 | | 2024 | CVPR | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf) | [Code](https://github.com/WalBouss/GEM), - | 8 | | 2024 | ECCV | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18) | [Code](https://github.com/wangf3014/SCLIP), - | 9 | | 2025 | WACV | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181) | [Code](https://github.com/sinahmr/NACLIP), - | 10 | 11 | ## 2.2 Segmentation Emerges from DMs 12 | | Year | Publication | Paper Title | Project | 13 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------:| 14 | | 2023 | ACL | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf) | [Code](https://github.com/castorini/daam), - | 15 | | 2023 | arXiv | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773) | [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/) | 16 | | 2023 | arXiv | [Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion](https://arxiv.org/pdf/2309.01369v1) | -, - | 17 | | 2023 | NeurIPS | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf) | [Code](https://github.com/VinAIResearch/Dataset-Diffusion), - | 18 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | -, [Project](https://bcorrad.github.io/freesegdiff/) | 19 | | 2024 | arXiv | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194) | [Code](https://github.com/Valkyrja3607/MaskDiffusion), - | 20 | | 2024 | CVPR | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf) | [Code](https://github.com/vpulab/ovam), - | 21 | | 2024 | ECCV | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf) | -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/) | 22 | 23 | 24 | ## 2.3 Segmentation Emerges from DINO 25 | | Year | Publication | Paper Title | Project | 26 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:| 27 | | 2021 | arXiv | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf) | [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/) | 28 | | 2021 | ICCV | [Emerging Properties in Self-Supervised Vision Transformers](https://openaccess.thecvf.com/content/ICCV2021/papers/Caron_Emerging_Properties_in_Self-Supervised_Vision_Transformers_ICCV_2021_paper.pdf) | [Code](https://github.com/facebookresearch/dino), - | 29 | | 2021 | BMVC | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf) | [Code](https://github.com/valeoai/LOST), - | 30 | | 2022 | arXiv | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363) | [Code](https://github.com/wvangansbeke/MaskDistill), - | 31 | | 2022 | CVPR | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) | 32 | | 2023 | ICLR | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC) | [Code](https://github.com/zadaianchuk/comus), - | 33 | | 2024 | TMLR | [DINOv2: Learning Robust Visual Features without Supervision](https://openreview.net/pdf?id=a68SUt6zFt) | [Code](https://github.com/facebookresearch/dinov2), - | 34 | -------------------------------------------------------------------------------- /4-PIS.md: -------------------------------------------------------------------------------- 1 | # 4. Foundation Model based PIS 2 | *** 3 | ## 4.1 Interactive Segmentation 4 | ### 4.1.1 SAM-based Solution 5 | | Year | Publication | Paper Title | Project | 6 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 7 | | 2023 | arXiv | [SAM on Medical Images: A Comprehensive Study on Three Prompt Modes](https://arxiv.org/pdf/2305.00035) | -,- | 8 | | 2023 | arXiv | [Customized segment anything model for medical image segmentation](https://arxiv.org/pdf/2304.13785) | [Code](https://github.com/hitachinsk/SAMed), - | 9 | | 2023 | arXiv | [Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation](https://arxiv.org/pdf/2304.12620) | [Code](https://github.com/SuperMedIntel/Medical-SAM-Adapter), - | 10 | | 2023 | MedIA | [Segment anything model for medical image analysis: an experimental study](https://www.sciencedirect.com/science/article/pii/S1361841523001780/pdfft?md5=398c81e13674a45f4a0b611f468b4ea8&pid=1-s2.0-S1361841523001780-main.pdf) | [Code](https://github.com/mazurowski-lab/segment-anything-medical-evaluation), - | 11 | | 2023 | MIDL | [SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model](https://arxiv.org/pdf/2304.05396) | -, - | 12 | | 2023 | MICCAI BrainLes | [Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars](https://arxiv.org/pdf/2308.14133) | -, - | 13 | | 2023 | MICCAI Society | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://link.springer.com/content/pdf/10.1007/978-3-031-47401-9_23.pdf?pdf=inline%20link) | -, - | 14 | | 2023 | NeurIPS | [Segment Anything in High Quality](https://dl.acm.org/doi/10.5555/3666122.3667425) | [Code](https://github.com/SysCV/SAM-HQ), - | 15 | | 2024 | Comput. Biol. Med | [Segment anything model for medical image segmentation: Current applications and future directions](https://www.sciencedirect.com/science/article/pii/S0010482524003226/pdfft?md5=68de9e9c773354807446ee39cc3b1cb0&pid=1-s2.0-S0010482524003226-main.pdf) | [Code](https://github.com/YichiZhang98/SAM4MIS), - | 16 | | 2024 | CVPR | [Graco: Granularity-controllable interactive segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhao_GraCo_Granularity-Controllable_Interactive_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/Zhao-Yian/GraCo), [Project](https://zhao-yian.github.io/GraCo) | 17 | | 2024 | ECCV | [Tokenize anything via prompting](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/06366.pdf) | [Code](https://github.com/baaivision/tokenize-anything), - | 18 | | 2024 | ECCV | [Open vocabulary sam: Segment and recognize twenty-thousand classes interactively](https://link.springer.com/content/pdf/10.1007/978-3-031-72775-7_24) | [Code](https://github.com/HarborYuan/ovsam), [Project](https://www.mmlab-ntu.com/project/ovsam/) | 19 | | 2024 | ECCV | [Semantic-sam: Segment and recognize anything at any granularity](https://arxiv.org/abs/2307.04767) | [Code](https://github.com/UX-Decoder/Semantic-SAM), - | 20 | | 2024 | MedIA | [3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation](https://www.sciencedirect.com/science/article/pii/S1361841524002494/pdfft?md5=69d14d00b8d36553854e02366ab6bb36&pid=1-s2.0-S1361841524002494-main.pdf) | [Code](https://github.com/med-air/3DSAM-adapter), - | 21 | | 2024 | Nat. Commun | [Segment anything in medical images](https://www.nature.com/articles/s41467-024-44824-z.pdf) | [Code](https://github.com/bowang-lab/MedSAM), - | 22 | | 2024 | Strahlenther Onkol | [The Segment Anything foundation model achieves favorable brain tumor auto-segmentation accuracy in MRI to support radiotherapy treatment planning](https://link.springer.com/content/pdf/10.1007/s00066-024-02313-8.pdf) | -, - | 23 | 24 | 25 | ## 4.2 Referring Segmentation 26 | ### 4.2.1 CLIP-based Solution 27 | | Year | Publication | Paper Title | Project | 28 | |------|:-----------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------:| 29 | | 2022 | arXiv | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725) | -, - | 30 | | 2022 | CVPR | [CRIS: CLIP-Driven Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_CRIS_CLIP-Driven_Referring_Image_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/DerrickWang005/CRIS.pytorch), - | 31 | | 2022 | CVPR | [Image Segmentation Using Text and Image Prompts](https://openaccess.thecvf.com/content/CVPR2022/papers/Luddecke_Image_Segmentation_Using_Text_and_Image_Prompts_CVPR_2022_paper.pdf) | [Code](https://github.com/timojl/clipseg), - | 32 | | 2023 | CVPR | [Zero-shot Referring Image Segmentation with Global-Local Context Features](https://openaccess.thecvf.com/content/CVPR2023/papers/Yu_Zero-Shot_Referring_Image_Segmentation_With_Global-Local_Context_Features_CVPR_2023_paper.pdf) | [Code](https://github.com/Seonghoon-Yu/Zero-shot-RIS), - | 33 | | 2023 | EMNLP | [Text Augmented Spatial-aware Zero-shot Referring Image Segmentation](https://aclanthology.org/2023.findings-emnlp.73.pdf) | [Code](https://github.com/suoych/TAS), - | 34 | | 2023 | ICCV | [ Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_Bridging_Vision_and_Language_Encoders_Parameter-Efficient_Tuning_for_Referring_Image_ICCV_2023_paper.pdf) | [Code](https://github.com/kkakkkka/ETRIS), - | 35 | | 2023 | ICCV | [ Referring Image Segmentation Using Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_Referring_Image_Segmentation_Using_Text_Supervision_ICCV_2023_paper.pdf) | [Code](https://github.com/fawnliu/TRIS), - | 36 | | 2023 | NeurIPS | [Text Promptable Surgical Instrument Segmentation with Vision-Language Models](https://papers.nips.cc/paper_files/paper/2023/file/5af741d487c5f0b08bfe56e11d1883e4-Paper-Conference.pdf) | [Code](https://github.com/franciszzj/TP-SIS), - | 37 | | 2024 | AAAI | [EAVL: Explicitly Align Vision and Language for Referring Image Segmentation](https://www.researchgate.net/profile/Yichen-Yan-9/publication/373263673_EAVL_Explicitly_Align_Vision_and_Language_for_Referring_Image_Segmentation/links/6617ad9d43f8df018dee471d/EAVL-Explicitly-Align-Vision-and-Language-for-Referring-Image-Segmentation.pdf) | -, - | 38 | | 2024 | ACL | [Extending CLIP’s Image-Text Alignment to Referring Image Segmentation](https://aclanthology.org/2024.naacl-long.258.pdf) | -, - | 39 | | 2024 | arXiv | [Improving Referring Image Segmentation using Vision-Aware Text Features](https://arxiv.org/pdf/2404.08590v1) | [Code](https://arxiv.org/pdf/2404.08590v1), [Project](https://nero1342.github.io/VATEX_RIS/) | 40 | | 2024 | CVPR | [Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Unveiling_Parts_Beyond_Objects_Towards_Finer-Granularity_Referring_Expression_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/Rubics-Xuan/MRES), [Project](https://rubics-xuan.github.io/MRES/) | 41 | | 2024 | ICLR | [BarLeRIa: An Efficient Tuning Framework for Referring Image Segmentation](https://openreview.net/pdf?id=wHLDHRkmEu) | [Code](https://github.com/NastrondAd/BarLeRIa), - | 42 | 43 | ### 4.2.2 DM-based Solution 44 | | Year | Publication | Paper Title | Project | 45 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:| 46 | | 2022 | arXiv | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org/pdf/2211.13224) | [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo) | 47 | | 2023 | arXiv | [Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models](https://arxiv.org/pdf/2308.16777) | [Code](https://github.com/kodenii/Ref-Diff), - | 48 | | 2024 | CVPR | [UniGS: Unified Representation for Image Generation and Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Qi_UniGS_Unified_Representation_for_Image_Generation_and_Segmentation_CVPR_2024_paper.pdf) | -, - | 49 | | 2023 | ICCV | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf) | [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/) | 50 | | 2023 | ICCV | [ LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://koutilya-pnvr.github.io/LD-ZNet/), [Project](https://koutilya-pnvr.github.io/LD-ZNet/) | 51 | | 2023 | IROS | [Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10341402) | -, - | 52 | 53 | 54 | ### 4.2.3 LLMs/MLLMs-based Solution 55 | | Year | Publication | Paper Title | Project | 56 | |------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------:| 57 | | 2023 | arXiv | [LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model](https://arxiv.org/pdf/2312.17240) | -, - | 58 | | 2023 | arXiv | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498) | [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/) | 59 | | 2024 | arXiv | [LaSagnA: Language-based Segmentation Assistant for Complex Queries](https://arxiv.org/pdf/2404.08506) | [Code](https://github.com/congvvc/LaSagnA), - | 60 | | 2024 | arXiv | [Empowering Segmentation Ability to Multi-modal Large Language Models](https://arxiv.org/pdf/2403.14141) | -, - | 61 | | 2024 | CVPR | [LISA: Reasoning Segmentation via Large Language Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Lai_LISA_Reasoning_Segmentation_via_Large_Language_Model_CVPR_2024_paper.pdf) | [Code](https://github.com/dvlab-research/LISA), - | 62 | | 2024 | CVPR | [PixelLM:Pixel Reasoning with Large Multimodal Model](https://openaccess.thecvf.com/content/CVPR2024/papers/Ren_PixelLM_Pixel_Reasoning_with_Large_Multimodal_Model_CVPR_2024_paper.pdf) | [Code](https://github.com/MaverickRen/PixelLM), [Project](https://pixellm.github.io/) | 63 | | 2024 | CVPR | [GSVA: Generalized Segmentation via Multimodal Large Language Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Xia_GSVA_Generalized_Segmentation_via_Multimodal_Large_Language_Models_CVPR_2024_paper.pdf) | [Code](https://github.com/LeapLabTHU/GSVA), - | 64 | | 2024 | CVPR | [Osprey: Pixel Understanding with Visual Instruction Tuning](https://openaccess.thecvf.com/content/CVPR2024/papers/Yuan_Osprey_Pixel_Understanding_with_Visual_Instruction_Tuning_CVPR_2024_paper.pdf) | [Code](https://github.com/CircleRadon/Osprey), - | 65 | | 2024 | CVPRW | [LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning](https://openaccess.thecvf.com/content/CVPR2024W/MMFM/papers/Wang_LLM-Seg_Bridging_Image_Segmentation_and_Large_Language_Model_Reasoning_CVPRW_2024_paper.pdf) | [Code](https://github.com/wangjunchi/LLMSeg), - | 66 | 67 | 68 | ### 4.2.4 Composition of FMs 69 | | Year | Publication | Paper Title | Project | 70 | |-------|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------:| 71 | | 2022 | arXiv | [Weakly-supervised segmentation of referring expressions](https://arxiv.org/pdf/2205.04725) | -, - | 72 | | 2022 | CVPR | [LAVT: Language-Aware Vision Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Yang_LAVT_Language-Aware_Vision_Transformer_for_Referring_Image_Segmentation_CVPR_2022_paper.pdf) | -, - | 73 | | 2022 | NeurIPS | [CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation](https://proceedings.neurips.cc/paper_files/paper/2022/file/5e773d319e310f1e4d695159484143b8-Paper-Conference.pdf) | -, - | 74 | | 2023 | AAAI | [ Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation](https://ojs.aaai.org/index.php/AAAI/article/download/25428/25200) | -, - | 75 | | 2023 | ACM MM | [CARIS: Context-Aware Referring Image Segmentation](https://web.archive.org/web/20231028201140id_/https://dl.acm.org/doi/pdf/10.1145/3581783.3612117) | [Code](https://github.com/lsa1997/CARIS), - | | 76 | | 2023 | arXiv | [NExT-Chat: An LMMfor Chat, Detection and Segmentation](https://arxiv.org/pdf/2311.04498) | [Code](https://github.com/NExT-ChatV/NExT-Chat), [Project](https://next-chatv.github.io/) | 77 | | 2023 | arXiv | [Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration](https://arxiv.org/pdf/2305.12799) | [Code](https://github.com/Yuqifan1117/Labal-Anything-Pipeline), - | 78 | | 2023 | CVPR | [GRES: Generalized Referring Expression Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_GRES_Generalized_Referring_Expression_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/henghuiding/ReLA), [Project](https://henghuiding.github.io/GRES/) | 79 | | 2023 | CVPR | [PolyFormer: Referring Image Segmentation as Sequential Polygon Generation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_PolyFormer_Referring_Image_Segmentation_As_Sequential_Polygon_Generation_CVPR_2023_paper.pdf) | [Code](https://github.com/amazon-science/polygon-transformer), [Project](https://polyformer.github.io/) | 80 | | 2023 | ICCV | [Beyond One-to-One: Rethinking the Referring Image Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Hu_Beyond_One-to-One_Rethinking_the_Referring_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/toggle1995/RIS-DMMI), - | 81 | | 2023 | ICCV | [Shatter and Gather:Learning Referring Image Segmentation with Text Supervision](https://openaccess.thecvf.com/content/ICCV2023/papers/Kim_Shatter_and_Gather_Learning_Referring_Image_Segmentation_with_Text_Supervision_ICCV_2023_paper.pdf) | [Code](https://github.com/kdwonn/SaG), [Project](https://southflame.github.io/sag/) | 82 | | 2023 | ICCV | [Segment Every Reference Object in Spatial and Temporal Spaces](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Segment_Every_Reference_Object_in_Spatial_and_Temporal_Spaces_ICCV_2023_paper.pdf) | [Code](https://github.com/FoundationVision/UniRef), - | 83 | | 2023 | TOMM | [Towards Complex-query Referring Image Segmentation: A Novel Benchmark](https://dl.acm.org/doi/pdf/10.1145/3701733) | [Code](https://github.com/lili0415/DuMoGa), - | 84 | | 2024 | ACM MM | [Deep Instruction Tuning for Segment Anything Model](https://dl.acm.org/doi/pdf/10.1145/3664647.3680571) | [Code](https://github.com/wysnzzzz/DIT), - | 85 | | 2024 | arXiv | [LLMBind: A Unified Modality-Task Integration Framework](https://arxiv.org/pdf/2402.14891) | [Code](https://github.com/PKU-YuanGroup/LLMBind), - | 86 | | 2024 | arXiv | [Training-Free Semantic Segmentation via LLM-Supervision](https://arxiv.org/pdf/2404.00701) | -, - | 87 | | 2024 | arXiv | [F-LMM: Grounding Frozen Large Multimodal Models](https://arxiv.org/pdf/2406.05821) | [Code](https://github.com/wusize/F-LMM), - | 88 | | 2024 | CVPR | [Mask Grounding for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Chng_Mask_Grounding_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/yxchng/mask-grounding), - | 89 | | 2024 | CVPR | [LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Shah_LQMFormer_Language-aware_Query_Mask_Transformer_for_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | - ,- | 90 | | 2024 | CVPR | [PerceptionGPT: Effectively Fusing Visual Perception into LLM](https://openaccess.thecvf.com/content/CVPR2024/papers/Pi_PerceptionGPT_Effectively_Fusing_Visual_Perception_into_LLM_CVPR_2024_paper.pdf) | [Code](https://github.com/pipilurj/perceptionGPT), - | 91 | | 2024 | CVPR | [Prompt-Driven Referring Image Segmentation with Instance Contrasting](https://openaccess.thecvf.com/content/CVPR2024/papers/Shang_Prompt-Driven_Referring_Image_Segmentation_with_Instance_Contrasting_CVPR_2024_paper.pdf) | -, - | 92 | | 2024 | CVPR | [Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Dai_Curriculum_Point_Prompting_for_Weakly-Supervised_Referring_Image_Segmentation_CVPR_2024_paper.pdf) | -, - | 93 | 94 | ## 4.3 Few-shot Segmentation 95 | ### 4.3.1 CLIP-based Solution 96 | | Year | Publication | Paper Title | Project | 97 | |------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:| 98 | | 2023 | arXiv | [PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning](https://arxiv.org/pdf/2308.12757) | -, - | 99 | | 2023 | CVPR | [ WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Jeong_WinCLIP_Zero-Few-Shot_Anomaly_Classification_and_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/caoyunkang/WinClip), - | 100 | | 2024 | arXiv | [Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation](https://arxiv.org/pdf/2405.13686) | -, - | 101 | | 2024 | CVPR | [Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Rethinking_Prior_Information_Generation_with_CLIP_for_Few-Shot_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/vangjin/PI-CLIP), - | 102 | | 2024 | CVPR | [Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Unlocking_the_Potential_of_Pre-trained_Vision_Transformers_for_Few-Shot_Semantic_CVPR_2024_paper.pdf) | [Code](https://github.com/ZiqinZhou66/FewSegwithRD.git), - | 103 | | 2024 | ICASSP | [Language-Guided Few-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10447456) | -, - | 104 | | 2024 | ICASSP | [Weakly Supervised Few-Shot Segmentation Through Textual Prompt](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446831) | [Code](https://github.com/Joseph-Lee-V/Text-WS-FSS), - | 105 | | 2024 | TMM | [Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10418263) | -, - | 106 | 107 | ### 4.3.2 DM-based Solution 108 | | Year | Publication | Paper Title | Project | 109 | |------|:-----------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------:| 110 | | 2023 | arXiv | [DifFSS: Diffusion Model for Few-Shot Semantic Segmentation](https://arxiv.org/pdf/2307.00773) | [Code](https://github.com/TrinitialChan/DifFSS), - | 111 | | 2024 | AAAI | [MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation](https://ojs.aaai.org/index.php/AAAI/article/view/28068/28143) | [Code](https://github.com/minhquanlecs/MaskDiff), - | 112 | | 2024 | arXiv | [SegICL: A Universal In-context Learning Framework for Enhanced Segmentation in Medical Imaging](https://arxiv.org/pdf/2403.16578) | -, - | 113 | | 2024 | TCE | [Few-Shot Semantic Segmentation for Consumer Electronics: An Inter-Class Relation Mining Approach](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10460319) | -, - | 114 | 115 | ### 4.3.3 DINO-based Solution 116 | | Year | Publication | Paper Title | Project | 117 | |-------|:------------------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------:| 118 | | 2023 | ICCV | [Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Kang_Distilling_Self-Supervised_Vision_Transformers_for_Weakly-Supervised_Few-Shot_Classification__Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/dahyun-kang/cst), - | 119 | | 2023 | NeurIPS R0-FoMo Workshop | [One-shot Localization and Segmentation of Medical Images with Foundation Models](https://arxiv.org/pdf/2310.18642) | -, - | 120 | | 2024 | arXiv | [A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models](https://arxiv.org/pdf/2401.11311) | [Code](https://github.com/RedaBensaidDS/Foundation_FewShot), - | 121 | | 2024 | ICRA | [Few-Shot Panoptic Segmentation With Foundation Models](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611624) | [Code](https://github.com/robot-learning-freiburg/SPINO), [Project](http://spino.cs.uni-freiburg.de/) | 122 | 123 | ### 4.3.4 SAM-based Solution 124 | | Year | Publication | Paper Title | Project | 125 | |-------|:-----------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------:| 126 | | 2024 | arXiv | [Boosting few shot semantic segmentation via segment anything model](https://arxiv.org/pdf/2401.09826) | -, - | 127 | | 2024 | arXiv | [Part-aware Personalized Segment Anything Model for Patient-Specific Segmentation](https://arxiv.org/pdf/2403.05433) | [Code](https://github.com/Zch0414/p2sam), - | 128 | | 2024 | CVPR | [VRP-SAM: SAMwithVisual Reference Prompt](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_VRP-SAM_SAM_with_Visual_Reference_Prompt_CVPR_2024_paper.pdf) | [Code](https://github.com/syp2ysy/VRP-SAM), - | 129 | | 2024 | CVPR | [APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/He_APSeg_Auto-Prompt_Network_for_Cross-Domain_Few-Shot_Semantic_Segmentation_CVPR_2024_paper.pdf) | -, - | 130 | | 2024 | ICLR | [Personalize Segment Anything Model with One Shot](https://openreview.net/pdf?id=6Gzkhoc6YS) | [Code](https://github.com/ZrrSkywalker/Personalize-SAM), - | 131 | 132 | ### 4.3.5 MLLMs-based Solution 133 | | Year | Publication | Paper Title | Project | 134 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------:| 135 | | 2023 | arXiv | [Few-Shot Classification & Segmentation Using Large Language Models Agent](https://arxiv.org/pdf/2311.12065) | -, - | 136 | | 2024 | CVPR | [Llafs: When large language models meet few-shot segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_LLaFS_When_Large_Language_Models_Meet_Few-Shot_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/lanyunzhu99/LLaFS), - | 137 | 138 | ### 4.3.6 In-Context Segmentation 139 | | Year | Publication | Paper Title | Project | 140 | |-------|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------:| 141 | | 2022 | NeurIPS | [Visual Prompting via Image Inpainting](https://proceedings.neurips.cc/paper_files/paper/2022/file/9f09f316a3eaf59d9ced5ffaefe97e0f-Paper-Conference.pdf) | [Code](https://github.com/amirbar/visual_prompting), [Project](https://yossigandelsman.github.io/visual_prompt/) | 142 | | 2023 | arXiv | [Exploring Effective Factors for Improving Visual In-Context Learning](https://arxiv.org/pdf/2304.04748) | [Code](https://github.com/syp2ysy/prompt-SelF), - | 143 | | 2023 | CVPR | [Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Images_Speak_in_Images_A_Generalist_Painter_for_In-Context_Visual_CVPR_2023_paper.pdf) | [Code](https://github.com/baaivision/Painter), - | 144 | | 2023 | NeurIPS | [What Makes Good Examples for Visual In-Context Learning?](https://proceedings.neurips.cc/paper_files/paper/2023/file/398ae57ed4fda79d0781c65c926d667b-Paper-Conference.pdf) | [Code](https://github.com/ZhangYuanhan-AI/visual_prompt_retrieval), - | 145 | | 2023 | NeurIPS | [In-Context Learning Unlocked for Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/1b3750390ca8b931fb9ca988647940cb-Paper-Conference.pdf) | [Code](https://github.com/Zhendong-Wang/Prompt-Diffusion), [Project](https://zhendong-wang.github.io/prompt-diffusion.github.io/) | 146 | | 2024 | CVPR | [Sequential Modeling Enables Scalable Learning for Large Vision Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Bai_Sequential_Modeling_Enables_Scalable_Learning_for_Large_Vision_Models_CVPR_2024_paper.pdf) | [Code](https://github.com/ytongbai/LVM), - | 147 | | 2024 | CVPR | [Towards More Unified In-context Visual Understanding](https://openaccess.thecvf.com/content/CVPR2024/papers/Sheng_Towards_More_Unified_In-context_Visual_Understanding_CVPR_2024_paper.pdf) | -, - | 148 | | 2024 | CVPR | [Tyche: Stochastic In-Context Learning for Medical Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Rakic_Tyche_Stochastic_In-Context_Learning_for_Medical_Image_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/mariannerakic/tyche), [Project](https://tyche.csail.mit.edu/) | 149 | | 2024 | ICLR | [Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching](https://openreview.net/pdf?id=yzRXdhk2he) | [Code](https://github.com/aim-uofa/Matcher), - | 150 | | 2024 | WACV | [Instruct Me More! Random Prompting for Visual In-Context Learning](https://openaccess.thecvf.com/content/WACV2024/papers/Zhang_Instruct_Me_More_Random_Prompting_for_Visual_In-Context_Learning_WACV_2024_paper.pdf) | [Code](https://github.com/Jackieam/InMeMo), - | 151 | -------------------------------------------------------------------------------- /3-GIS.md: -------------------------------------------------------------------------------- 1 | # 3. Foundation Model based GIS 2 | 3 | ## 3.1 Semantic Segmentation 4 | ### 3.1.1 CLIP-based Solution 5 | | Year | Publication | Paper Title | Project | 6 | |:-----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------:| 7 | | 2022 | BMVC | [Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models](https://bmvc2022.mpi-inf.mpg.de/0045.pdf) | [Code](https://github.com/chaofanma/Fusioner), [Project](https://yyh-rain-song.github.io/Fusioner_webpage/) | 8 | | 2022 | CVPR | [DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting](https://openaccess.thecvf.com/content/CVPR2022/papers/Rao_DenseCLIP_Language-Guided_Dense_Prediction_With_Context-Aware_Prompting_CVPR_2022_paper.pdf) | [Code](https://github.com/raoyongming/DenseCLIP), [Project](https://denseclip.ivg-research.xyz/) | 9 | | 2022 | CVPR | [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://openaccess.thecvf.com/content/CVPR2022/papers/Xu_GroupViT_Semantic_Segmentation_Emerges_From_Text_Supervision_CVPR_2022_paper.pdf) | [Code](https://github.com/NVlabs/GroupViT), [Project](https://jerryxu.net/GroupViT/) | 10 | | 2022 | CVPR | [Decoupling Zero-Shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2022/papers/Ding_Decoupling_Zero-Shot_Semantic_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/dingjiansw101/ZegFormer), - | 11 | | 2022 | ECCV | [A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136890725.pdf) | [Code](https://github.com/MendelXu/zsseg.baseline), - | 12 | | 2022 | ECCV | [Extract Free Dense Labels from CLIP](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136880687.pdf) | [Code](https://github.com/chongzhou96/MaskCLIP), [Project](https://www.mmlab-ntu.com/project/maskclip/) | 13 | | 2022 | ICLR | [Language-driven Semantic Segmentation](https://openreview.net/pdf?id=RriDjddCLN) | [Code](https://github.com/isl-org/lang-seg), - | 14 | | 2023 | arXiv | [A Closer Look at the Explainability of Contrastive Language-Image Pre-training](https://arxiv.org/pdf/2304.05653) | [Code](https://github.com/xmed-lab/CLIP_Surgery), - | 15 | | 2023 | arXiv | [ZegOT: Zero-shot Segmentation Through Optimal Transport of Text Prompts](https://arxiv.org/pdf/2301.12171) | [Code](https://github.com/cubeyoung/ZegOT), [Project](https://cubeyoung.github.io/zegot.github.io/) | 16 | | 2023 | arXiv | [CLIP is Also a Good Teacher: A New Training Framework for Inductive Zero-shot Semantic Segmentation](https://arxiv.org/pdf/2310.02296) | -, - | 17 | | 2023 | arXiv | [Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation](https://arxiv.org/pdf/2304.01114) | -, - | 18 | | 2023 | arXiv | [TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification](https://arxiv.org/pdf/2312.14149) | [Code](https://github.com/Qinying-Liu/TagAlign), [Project](https://qinying-liu.github.io/Tag-Align/) | 19 | | 2023 | arXiv | [CLIP-DINOiser: Teaching CLIP a few DINO tricks for Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2312.12359) | [Code](https://github.com/wysoczanska/clip_dinoiser), [Project](https://wysoczanska.github.io/CLIP_DINOiser/) | 20 | | 2023 | CVPR | [Probabilistic Prompt Learning for Dense Prediction](https://openaccess.thecvf.com/content/CVPR2023/papers/Kwon_Probabilistic_Prompt_Learning_for_Dense_Prediction_CVPR_2023_paper.pdf) | -, - | 21 | | 2023 | CVPR | [ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2023/papers/Zhou_ZegCLIP_Towards_Adapting_CLIP_for_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/ZiqinZhou66/ZegCLIP), - | 22 | | 2023 | CVPR | [ Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning](https://openaccess.thecvf.com/content/CVPR2023/papers/Mukhoti_Open_Vocabulary_Semantic_Segmentation_With_Patch_Aligned_Contrastive_Learning_CVPR_2023_paper.pdf) | -, - | 23 | | 2023 | CVPR | [ Side Adapter Network for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Xu_Side_Adapter_Network_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/MendelXu/SAN), [Project](https://mendelxu.github.io/SAN/) | 24 | | 2023 | CVPR | [ CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_CLIP_Is_Also_an_Efficient_Segmenter_A_Text-Driven_Approach_for_CVPR_2023_paper.pdf) | [Code](https://github.com/linyq2117/CLIP-ES), - | 25 | | 2023 | CVPR | [Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP](https://openaccess.thecvf.com/content/CVPR2023/papers/Liang_Open-Vocabulary_Semantic_Segmentation_With_Mask-Adapted_CLIP_CVPR_2023_paper.pdf) | [Code](https://github.com/facebookresearch/ov-seg), [Project](https://jeff-liangf.github.io/projects/ovseg/) | 26 | | 2023 | CVPR | [CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/He_CLIP-S4_Language-Guided_Self-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | -, - | 27 | | 2023 | CVPR | [Delving into Shape-aware Zero-shot Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_Delving_Into_Shape-Aware_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/Liuxinyv/SAZS), - | 28 | | 2023 | CVPR | [A Simple Framework for Text-Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Yi_A_Simple_Framework_for_Text-Supervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/muyangyi/SimSeg), - | 29 | | 2023 | ICCV | [ Perceptual Grouping in Contrastive Vision-Language Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Ranasinghe_Perceptual_Grouping_in_Contrastive_Vision-Language_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/kahnchana/clippy), - | 30 | | 2023 | ICCV | [Global Knowledge Calibration for Fast Open-Vocabulary Segmentation](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Global_Knowledge_Calibration_for_Fast_Open-Vocabulary_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/yongliu20/GKC), - | 31 | | 2023 | ICCV | [Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network](https://openaccess.thecvf.com/content/ICCV2023/papers/Han_Open-Vocabulary_Semantic_Segmentation_with_Decoupled_One-Pass_Network_ICCV_2023_paper.pdf) | [Code](https://github.com/CongHan0808/DeOP), [Project](https://conghan0808.github.io/DeOP/) | 32 | | 2023 | ICCV | [ Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Exploring_Open-Vocabulary_Semantic_Segmentation_from_CLIP_Vision_Encoder_Distillation_Only_ICCV_2023_paper.pdf) | [Code](https://github.com/facebookresearch/ZeroSeg), - | 33 | | 2023 | ICML | [SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation](https://proceedings.mlr.press/v202/luo23a/luo23a.pdf) | [Code](https://github.com/ArrowLuo/SegCLIP), | 34 | | 2023 | NeurIPS | [Learning Mask-aware CLIP Representations for Zero-Shot Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/6ffe484a646db13891bb6435ca39d667-Paper-Conference.pdf) | [Code](https://github.com/jiaosiyu1999/MAFT), - | 35 | | 2023 | NeurIPS | [Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/e95eb5206c867be843fbc14bbfe8c10e-Paper-Conference.pdf) | [Code](https://github.com/Ferenas/PGSeg), - | 36 | | 2024 | arXiv | [kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies](https://arxiv.org/pdf/2404.09447) | -, - | 37 | | 2024 | CVPR | [ Grounding Everything: Emerging Localization Properties in Vision-Language Transformers](https://openaccess.thecvf.com/content/CVPR2024/papers/Bousselham_Grounding_Everything_Emerging_Localization_Properties_in_Vision-Language_Transformers_CVPR_2024_paper.pdf) | [Code](https://github.com/WalBouss/GEM), - | 38 | | 2024 | CVPR | [ CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Cho_CAT-Seg_Cost_Aggregation_for_Open-Vocabulary_Semantic_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/cvlab-kaist/CAT-Seg), [Project](https://ku-cvlab.github.io/CAT-Seg/) | 39 | | 2024 | CVPR | [CLIP as RNN:Segment Countless Visual Concepts without Training Endeavor](https://openaccess.thecvf.com/content/CVPR2024/papers/Sun_CLIP_as_RNN_Segment_Countless_Visual_Concepts_without_Training_Endeavor_CVPR_2024_paper.pdf) | [Code](https://github.com/kevin-ssy/CLIP_as_RNN), [Project](https://torrvision.com/clip_as_rnn/) | 40 | | 2024 | ECCV | [SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference](https://link.springer.com/chapter/10.1007/978-3-031-72664-4_18) | [Code](https://github.com/wangf3014/SCLIP), - | 41 | | 2024 | ECCV | [OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-72980-5_12) | [Code](https://github.com/cubeyoung/OTSeg), - | 42 | | 2024 | ECCV | [SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance](https://link.springer.com/content/pdf/10.1007/978-3-031-72933-1_15) | [Code](https://github.com/google-research/semivl/), - | 43 | | 2024 | ECCV | [SILC: Improving Vision Language Pretraining with Self-distillation](https://link.springer.com/content/pdf/10.1007/978-3-031-72664-4_3) | 44 | | 2024 | ICLR | [CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction](https://openreview.net/pdf?id=DjzvJCRsVf) | [Code](https://github.com/wusize/CLIPSelf), - | 45 | | 2024 | TCSVT | [Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10764736) | -, - | 46 | | 2025 | WACV | [Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2404.08181) | [Code](https://github.com/sinahmr/NACLIP), - | 47 | 48 | ### 3.1.2 DM-based Solution 49 | | Year | Publication | Paper Title | Project | 50 | |:----:|:-----------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------:| 51 | | 2022 | arXiv | [Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors](https://arxiv.org//pdf/2211.13224v2) | [Code](https://github.com/RyannDaGreat/Peekaboo), [Project](https://ryanndagreat.github.io/peekaboo/) | 52 | | 2022 | ICLR | [Label-Efficient Semantic Segmentation with Diffusion Models](https://openreview.net/pdf?id=SlxSY2UZQT) | [Code](https://github.com/yandex-research/ddpm-segmentation), [Project](https://yandex-research.github.io/ddpm-segmentation/) | 53 | | 2022 | MIDL | [Diffusion Models for Implicit Image Segmentation Ensembles](https://proceedings.mlr.press/v172/wolleb22a/wolleb22a.pdf) | [Code](https://github.com/JuliaWolleb/Diffusion-based-Segmentation), - | 54 | | 2022 | SASHIMI | [Can Segmentation Models Be Trained with Fully Synthetically Generated Data?](https://link.springer.com/content/pdf/10.1007/978-3-031-16980-9_8.pdf) | -, - | 55 | | 2023 | ACL | [What the DAAM:Interpreting Stable Diffusion Using Cross Attention](https://aclanthology.org/2023.acl-long.310.pdf) | [Code](https://github.com/castorini/daam), - | 56 | | 2023 | arXiv | [Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter](https://arxiv.org/pdf/2309.02773) | [Code](https://github.com/VCG-team/DiffSegmenter), [Project](https://vcg-team.github.io/DiffSegmenter-webpage/) | 57 | | 2023 | arXiv | [Harnessing Diffusion Models for Visual Perception with Meta Prompts](https://arxiv.org/pdf/2312.14733) | [Code](https://github.com/fudan-zvg/meta-prompts), - | 58 | | 2023 | arXiv | [EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model](https://arxiv.org/pdf/2310.12868) | [Code](https://github.com/NUBagciLab/DiffBoost), - | 59 | | 2023 | CVPR | [Ambiguous Medical Image Segmentation using Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/papers/Rahman_Ambiguous_Medical_Image_Segmentation_Using_Diffusion_Models_CVPR_2023_paper.pdf) | [Code](https://github.com/aimansnigdha/ambiguous-medical-image-segmentation-using-diffusion-models), [Project](https://aimansnigdha.github.io/cimd/) | 60 | | 2023 | ICCV | [DDP: Diffusion Model for Dense Visual Prediction](https://openaccess.thecvf.com/content/ICCV2023/papers/Ji_DDP_Diffusion_Model_for_Dense_Visual_Prediction_ICCV_2023_paper.pdf) | [Code](https://github.com/JiYuanFeng/DDP), [Project](https://github.com/JiYuanFeng/DDP) | 61 | | 2023 | ICCV | [Unleashing Text-to-Image Diffusion Models for Visual Perception](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhao_Unleashing_Text-to-Image_Diffusion_Models_for_Visual_Perception_ICCV_2023_paper.pdf) | [Code](https://github.com/wl-zhao/VPD), [Project](https://vpd.ivg-research.xyz/) | 62 | | 2023 | ICCV | [LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation](http://openaccess.thecvf.com/content/ICCV2023/papers/PNVR_LD-ZNet_A_Latent_Diffusion_Approach_for_Text-Based_Image_Segmentation_ICCV_2023_paper.pdf) | [Code](https://github.com/koutilya-pnvr/LD-ZNet), [Project](https://koutilya-pnvr.github.io/LD-ZNet/) | 63 | | 2023 | ICCV | [Stochastic Segmentation with Conditional Categorical Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Zbinden_Stochastic_Segmentation_with_Conditional_Categorical_Diffusion_Models_ICCV_2023_paper.pdf) | [Code](https://github.com/LarsDoorenbos/ccdm-stochastic-segmentation), - | 64 | | 2023 | ICCV | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.pdf) | [Code](https://github.com/weijiawu/DiffuMask), [Project](https://weijiawu.github.io/DiffusionMask/) | 65 | | 2023 | MICCAI | [Diffusion-Based Data Augmentation for Nuclei Image Segmentation](https://link.springer.com/content/pdf/10.1007/978-3-031-43993-3_57.pdf?pdf=inline%20link) | [Code](https://github.com/xinyiyu/Nudiff), - | 66 | | 2023 | NeurIPS | [Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation](https://proceedings.neurips.cc/paper_files/paper/2023/file/f2957e48240c1d90e62b303574871b47-Paper-Conference.pdf) | [Code](https://github.com/VinAIResearch/Dataset-Diffusion), - | 67 | | 2023 | NeurIPS | [Unsupervised Semantic Correspondence Using Stable Diffusion](https://proceedings.neurips.cc/paper_files/paper/2023/file/1a074a28c3a6f2056562d00649ae6416-Paper-Conference.pdf) | [Code](https://github.com/ubc-vision/LDM_correspondences), [Project](https://ubc-vision.github.io/LDM_correspondences/) | 68 | | 2023 | NeurIPS | [SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process](https://papers.nips.cc/paper_files/paper/2023/file/fc0cc55dca3d791c4a0bb2d8ddeefe4f-Paper-Conference.pdf) | [Code](https://github.com/MengyuWang826/SegRefiner), - | 69 | | 2023 | NeurIPS | [DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models](https://proceedings.neurips.cc/paper_files/paper/2023/file/ab6e7ad2354f350b451b5a8e14d04f51-Paper-Conference.pdf) | [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/) | 70 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | -, [Project](https://bcorrad.github.io/freesegdiff/) | 71 | | 2024 | arXiv | [MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation](https://arxiv.org/pdf/2403.11194) | [Code](https://github.com/Valkyrja3607/MaskDiffusion), - | 72 | | 2024 | CVPR | [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](https://openaccess.thecvf.com/content/CVPR2024/papers/Barsellotti_Training-Free_Open-Vocabulary_Segmentation_with_Offline_Diffusion-Augmented_Prototype_Generation_CVPR_2024_paper.pdf) | [Code](https://github.com/aimagelab/freeda), [Project](https://aimagelab.github.io/freeda/) | 73 | | 2024 | CVPR | [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://openaccess.thecvf.com/content/CVPR2024/papers/Marcos-Manchon_Open-Vocabulary_Attention_Maps_with_Token_Optimization_for_Semantic_Segmentation_in_CVPR_2024_paper.pdf) | [Code](https://github.com/vpulab/ovam), - | 74 | | 2024 | CVPR | [Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion](https://openaccess.thecvf.com/content/CVPR2024/papers/Tian_Diffuse_Attend_and_Segment_Unsupervised_Zero-Shot_Segmentation_using_Stable_Diffusion_CVPR_2024_paper.pdf) | [Code](https://github.com/google/diffseg), [Project](https://sites.google.com/view/diffseg/home) | 75 | | 2024 | CVPR | [Text-image Alignment for Diffusion-based Perception](https://openaccess.thecvf.com/content/CVPR2024/papers/Kondapaneni_Text-Image_Alignment_for_Diffusion-Based_Perception_CVPR_2024_paper.pdf) | [Code](https://github.com/damaggu/TADP), [Project](https://www.vision.caltech.edu/tadp/) | 76 | | 2024 | CVPRW | [ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation](http://mengtang.org/scribblegen_cvprw2024.pdf) | [Code](https://github.com/mengtang-lab/scribblegen), - | 77 | | 2024 | ECCV | [Diffusion Models for Open-Vocabulary Segmentation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00794.pdf) | -, [Project](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/) | 78 | | 2024 | ECCV | [Do Text-Free Diffusion Models Learn Discriminative Visual Representations?](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/07757-supp.pdf) | [Code](https://github.com/soumik-kanad/diffssl), - | 79 | | 2024 | IJCAI | [Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors](https://www.ijcai.org/proceedings/2024/0082.pdf) | -, - | 80 | | 2024 | IJCV | [MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation](https://arxiv.org//pdf/2309.13042.pdf) | [Code](https://github.com/Jiahao000/MosaicFusion), - | 81 | 82 | 83 | ### 3.1.3 DINO-based Solution 84 | | Year | Publication | Paper Title | Project | 85 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------:| 86 | | 2021 | arXiv | [Deep ViT Features as Dense Visual Descriptors](https://dino-vit-features.github.io/paper.pdf) | [Code](https://github.com/ShirAmir/dino-vit-features), [Project](https://dino-vit-features.github.io/) | 87 | | 2021 | BMVC | [Localizing Objects with Self-Supervised Transformers and no Labels](https://www.bmvc2021-virtualconference.com/assets/papers/1339.pdf) | [Code](https://github.com/valeoai/LOST), - | 88 | | 2022 | CVPR | [Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization](https://openaccess.thecvf.com/content/CVPR2022/papers/Melas-Kyriazi_Deep_Spectral_Methods_A_Surprisingly_Strong_Baseline_for_Unsupervised_Semantic_CVPR_2022_paper.pdf) | [Code](https://github.com/lukemelas/deep-spectral-segmentation), [Project](https://lukemelas.github.io/deep-spectral-segmentation/) | 89 | | 2022 | CVPR | [Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut](http://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Self-Supervised_Transformers_for_Unsupervised_Object_Discovery_Using_Normalized_Cut_CVPR_2022_paper.pdf) | [Code](https://github.com/YangtaoWANG95/TokenCut), [Project](https://www.m-psi.fr/Papers/TokenCut2022/) | 90 | | 2022 | CVPR | [Self-Supervised Learning of Object Parts for Semantic Segmentation](http://openaccess.thecvf.com/content/CVPR2022/papers/Ziegler_Self-Supervised_Learning_of_Object_Parts_for_Semantic_Segmentation_CVPR_2022_paper.pdf) | [Code](https://github.com/MkuuWaUjinga/leopart), - | 91 | | 2022 | ICLR | [STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414) | [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html) | 92 | | 2023 | CVPR | [Leveraging Hidden Positives for Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2023/papers/Seong_Leveraging_Hidden_Positives_for_Unsupervised_Semantic_Segmentation_CVPR_2023_paper.pdf) | [Code](https://github.com/hynnsk/HP), - | 93 | | 2023 | ICLR | [Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations](https://openreview.net/pdf?id=1_jFneF07YC) | [Code](https://github.com/zadaianchuk/comus), - | 94 | | 2023 | TPAMI | [TokenCut: Segmenting Objects in Images and Videos With Self-Supervised Transformer and Normalized Cut](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10224285) | [Code](https://github.com/YangtaoWANG95/TokenCut_video), [Project](https://www.m-psi.fr/Papers/TokenCut2022/) | 95 | | 2024 | CVPR | [ Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling](https://openaccess.thecvf.com/content/CVPR2024/papers/Sick_Unsupervised_Semantic_Segmentation_Through_Depth-Guided_Feature_Correlation_and_Sampling_CVPR_2024_paper.pdf) | [Code](https://github.com/leonsick/depthg), [Project](https://leonsick.github.io/depthg/) | 96 | | 2024 | CVPR | [ EAGLE:Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kim_EAGLE_Eigen_Aggregation_Learning_for_Object-Centric_Unsupervised_Semantic_Segmentation_CVPR_2024_paper.pdf) | [Code](https://github.com/dnwjddl/EAGLE), - | 97 | 98 | ### 3.1.4 SAM-based Solution 99 | | Year | Publication | Paper Title | Project | 100 | |:----:|:-----------:|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------:| 101 | | 2023 | arXiv | [Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation](https://arxiv.org/pdf/2305.05803) | -, - | 102 | | 2023 | arXiv | [Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models](https://arxiv.org//pdf/2310.13026.pdf) | -, - | 103 | | 2024 | CVPR | [ From SAMtoCAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Kweon_From_SAM_to_CAMs_Exploring_Segment_Anything_Model_for_Weakly_CVPR_2024_paper.pdf) | [Code](https://github.com/sangrockEG/S2C), - | 104 | 105 | ### 3.1.5 Composition of FMs 106 | | Year | Publication | Paper Title | Project | 107 | |:-----:|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------:| 108 | | 2023 | ICCV | [Zero-guidance Segmentation Using Zero Segment Labels](https://openaccess.thecvf.com/content/ICCV2023/papers/Rewatbowornwong_Zero-guidance_Segmentation_Using_Zero_Segment_Labels_ICCV_2023_paper.pdf) | [Code](https://github.com/nessessence/ZeroGuidanceSeg), [Project](https://zero-guide-seg.github.io/) | 109 | | 2023 | MICCAI | [SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation](https://arxiv.org/pdf/2308.07156) | -, - | 110 | | 2024 | arXiv | [TAG: Guidance-free Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2403.11197) | [Code](https://github.com/Valkyrja3607/TAG), - | 111 | | 2024 | arXiv | [FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models](https://arxiv.org/pdf/2403.20105) | [Code](https://bcorrad.github.io/freesegdiff/#), [Project](https://bcorrad.github.io/freesegdiff/) | 112 | | 2024 | CVPR | [Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation](https://openaccess.thecvf.com/content/CVPR2024/papers/Wang_Image-to-Image_Matching_via_Foundation_Models_A_New_Perspective_for_Open-Vocabulary_CVPR_2024_paper.pdf) | -, - | 113 | 114 | ## 3.2 Instance Segmentation 115 | ### 3.2.1 CLIP-based Solution 116 | | Year | Publication | Paper Title | Project | 117 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 118 | | 2023 | ICML | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729) | [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/) | 119 | | 2023 | ICCV | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html) | [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/) | 120 | | 2023 | NeurIps | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html) | [Code](https://github.com/bytedance/fc-clip), - | 121 | | 2023 | ICCV | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html) | [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/) | 122 | | 2023 | CVPR | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/) | 123 | | 2023 | CVPR | [Semantic-promoted debiasing and background disambiguation for zero-shot instance segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Semantic-Promoted_Debiasing_and_Background_Disambiguation_for_Zero-Shot_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/D2Zero), [Project](https://henghuiding.github.io/D2Zero/) | 124 | 125 | ### 3.2.2 DM-based Solution 126 | | Year | Publication | Paper Title | Project | 127 | |-------|:-----------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 128 | | 2023 | arXiv | [Mosaicfusion: Diffusion models as data augmenters for large vocabulary instance segmentation](https://link.springer.com/article/10.1007/s11263-024-02223-3) | [Code](https://github.com/Jiahao000/MosaicFusion), - | 129 | | 2022 | arXiv | [Dalle for detection: Language-driven compositional image synthesis for object detection](https://arxiv.org/abs/2206.09592) | [Code](https://github.com/gyhandy/Text2Image-for-Detection), - | 130 | | 2023 | NeurIps | [Datasetdm: Synthesizing data with perception annotations using diffusion models](https://proceedings.neurips.cc/paper_files/paper/2023/hash/ab6e7ad2354f350b451b5a8e14d04f51-Abstract-Conference.html) | [Code](https://github.com/showlab/DatasetDM), [Project](https://weijiawu.github.io/DatasetDM_page/) | 131 | 132 | 133 | ### 3.2.3 DINO-based Solution 134 | | Year | Publication | Paper Title | Project | 135 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 136 | | 2022 | arXiv | [Discovering object masks with transformers for unsupervised semantic segmentation](https://arxiv.org/abs/2206.06363) | [Code](https://github.com/wvangansbeke/MaskDistill), - | 137 | | 2023 | CVPR | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/) | 138 | | 2023 | NeurIps | [HASSOD: Hierarchical Adaptive Self-Supervised Object Detection](https://proceedings.neurips.cc/paper_files/paper/2023/hash/b9ecf4d84999a61783c360c3782e801e-Abstract-Conference.html) | [Code](https://github.com/Shengcao-Cao/HASSOD), [Project](https://hassod-neurips23.github.io/) | 139 | | 2024 | CVPR | [CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers](https://openaccess.thecvf.com/content/CVPR2024/html/Arica_CuVLER_Enhanced_Unsupervised_Object_Discoveries_through_Exhaustive_Self-Supervised_Transformers_CVPR_2024_paper.html) | [Code](https://github.com/shahaf-arica/CuVLER), - | 140 | 141 | ### 3.2.4 Composition of FMs for Instance Segmentation 142 | | Year | Publication | Paper Title | Project | 143 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 144 | | 2023 | ICML | [X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion](https://proceedings.mlr.press/v202/zhao23f.html) | [Code](https://github.com/yoctta/XPaste), - | 145 | | 2024 | CVPR | [DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data](https://openaccess.thecvf.com/content/CVPR2024/html/Fan_DiverGen_Improving_Instance_Segmentation_by_Learning_Wider_Data_Distribution_with_CVPR_2024_paper.html) | [Code](https://github.com/aim-uofa/DiverGen), - | 146 | | 2024 | ICLR | [The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models](https://arxiv.org/abs/2404.11957) | [Code](https://github.com/chengshiest/zip-your-clip), - | 147 | | 2023 | NeurIPS | [Segment anything in high quality](https://proceedings.neurips.cc/paper_files/paper/2023/hash/5f828e38160f31935cfe9f67503ad17c-Abstract-Conference.html) | [Code](https://github.com/SysCV/sam-hq), - | 148 | | 2024 | arXiv | [Grounded sam: Assembling open-world models for diverse visual tasks](https://arxiv.org/abs/2401.14159) | [Code](https://github.com/IDEA-Research/Grounded-Segment-Anything), - | 149 | 150 | ## 3.3 Panoptic Segmentation 151 | ### 3.3.1 CLIP-based Solution 152 | | Year | Publication | Paper Title | Project | 153 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 154 | | 2023 | ICML | [Open-vocabulary universal image segmentation with maskclip](https://dl.acm.org/doi/abs/10.5555/3618408.3618729) | [Code](https://github.com/mlpc-ucsd/MaskCLIP), [Project](https://maskclip.github.io/) | 155 | | 2023 | ICCV | [Masqclip for open vocabulary universal image segmentation](https://openaccess.thecvf.com/content/ICCV2023/html/Xu_MasQCLIP_for_Open-Vocabulary_Universal_Image_Segmentation_ICCV_2023_paper.html) | [Code](https://github.com/mlpc-ucsd/MasQCLIP), [Project](https://masqclip.github.io/) | 156 | | 2023 | NeurIps | [Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip](https://proceedings.neurips.cc/paper_files/paper/2023/hash/661caac7729aa7d8c6b8ac0d39ccbc6a-Abstract-Conference.html) | [Code](https://github.com/bytedance/fc-clip), - | 157 | | 2023 | ICCV | [Open vocabulary panoptic segmentation with embedding modulation](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_Open-vocabulary_Panoptic_Segmentation_with_Embedding_Modulation_ICCV_2023_paper.html) | [Code](https://github.com/XavierCHEN34/OPSNet), [Project](https://opsnet-page.github.io/) | 158 | | 2023 | CVPR | [Primitive generation and semantic related alignment for universal zero-shot segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/He_Primitive_Generation_and_Semantic-Related_Alignment_for_Universal_Zero-Shot_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/heshuting555/PADing), [Project](https://henghuiding.github.io/PADing/) | 159 | | 2023 | CVPR | [Generalized Decoding for Pixel, Image, and Language](https://openaccess.thecvf.com/content/CVPR2023/html/Zou_Generalized_Decoding_for_Pixel_Image_and_Language_CVPR_2023_paper) | [Code](https://github.com/microsoft/X-Decoder), [Project](https://x-decoder-vl.github.io/) | 160 | | 2024 | arXiv | [Open-vocabulary segmentation with unpaired mask-text super vision](https://arxiv.org/abs/2402.08960) | [Code](https://github.com/DerrickWang005/Unpair-Seg.pytorch), - | 161 | | 2023 | CVPR | [FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Qin_FreeSeg_Unified_Universal_and_Open-Vocabulary_Image_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/bytedance/FreeSeg), - | 162 | | 2023 | NeurIps | [Dataseg: Taming a universal multi-dataset multi-task segmentation model](https://proceedings.neurips.cc/paper_files/paper/2023/hash/d4eed238cf5807c6b75face996302892-Abstract-Conference.html) | -, - | 163 | | 2024 | CVPR | [OMG-Seg: Is One Model Good Enough For All Segmentation?](https://openaccess.thecvf.com/content/CVPR2024/html/Li_OMG-Seg_Is_One_Model_Good_Enough_For_All_Segmentation_CVPR_2024_paper.html) | [Code](https://github.com/lxtgh/omg-seg), - | 164 | 165 | ### 3.3.2 DM-based Solution 166 | | Year | Publication | Paper Title | Project | 167 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 168 | | 2023 | CVPR | [Open-Vocabulary Panoptic Segmentation With Text-to-Image Diffusion Models](https://openaccess.thecvf.com/content/CVPR2023/html/Xu_Open-Vocabulary_Panoptic_Segmentation_With_Text-to-Image_Diffusion_Models_CVPR_2023_paper.html) | [Code](https://github.com/nvlabs/odise), - | 169 | | 2023 | ICCV | [A Generalist Framework for Panoptic Segmentation of Images and Videos](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_A_Generalist_Framework_for_Panoptic_Segmentation_of_Images_and_Videos_ICCV_2023_paper.html) | [Code](https://github.com/google-research/pix2seq), - | 170 | | 2023 | ICLR | [Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning](https://arxiv.org/abs/2208.04202) | [Code](https://github.com/google-research/pix2seq), - | 171 | | 2023 | ICCV | [DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models](https://openaccess.thecvf.com/content/ICCV2023/html/Wu_DiffuMask_Synthesizing_Images_with_Pixel-level_Annotations_for_Semantic_Segmentation_Using_ICCV_2023_paper.html) | [Code](https://github.com/weijiawu/DiffusionMask), - | 172 | | 2024 | ECCV | [A simple latent diffusion approach for panoptic segmentation and mask inpainting](https://link.springer.com/chapter/10.1007/978-3-031-72633-0_5) | [Code](https://github.com/segments-ai/latent-diffusion-segmentation), - | 173 | 174 | ### 3.3.3 DINO-based Solution 175 | | Year | Publication | Paper Title | Project | 176 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 177 | | 2022 | ICLR | [Unsupervised Semantic Segmentation by Distilling Feature Correspondences](https://arxiv.org/abs/2203.08414) | [Code](https://github.com/mhamilton723/STEGO), [Project](https://mhamilton.net/stego.html) | 178 | | 2023 | CVPR | [Cut and Learn for Unsupervised Object Detection and Instance Segmentation](https://openaccess.thecvf.com/content/CVPR2023/html/Wang_Cut_and_Learn_for_Unsupervised_Object_Detection_and_Instance_Segmentation_CVPR_2023_paper.html) | [Code](https://github.com/facebookresearch/CutLER), [Project](https://people.eecs.berkeley.edu/~xdwang/projects/CutLER/) | 179 | | 2024 | CVPR | [Unsupervised Universal Image Segmentation](https://openaccess.thecvf.com/content/CVPR2024/html/Niu_Unsupervised_Universal_Image_Segmentation_CVPR_2024_paper.html) | [Code](https://github.com/u2seg/U2Seg), [Project](https://u2seg.github.io/) | 180 | | 2019 | CVPR | [Panoptic feature pyramid networks](https://openaccess.thecvf.com/content_CVPR_2019/html/Kirillov_Panoptic_Feature_Pyramid_Networks_CVPR_2019_paper.html) | [Code](https://github.com/Angzz/panoptic-fpn-gluon), - | 181 | | 2024 | arXiv | [A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation](https://arxiv.org/abs/2405.19035) | [Code](https://github.com/robot-learning-freiburg/PASTEL), - | 182 | | 2020 | CVPR | [Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation](https://openaccess.thecvf.com/content_CVPR_2020/html/Cheng_Panoptic-DeepLab_A_Simple_Strong_and_Fast_Baseline_for_Bottom-Up_Panoptic_CVPR_2020_paper.html) | [Code](https://github.com/bowenc0221/panoptic-deeplab), - | 183 | 184 | ### 3.3.4 SAM-based Solution 185 | | Year | Publication | Paper Title | Project | 186 | |-------|:------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------:| 187 | | 2024 | ECCV | [Semantic-SAM: Segment and Recognize Anything at Any Granularity](https://arxiv.org/abs/2307.04767) | [Code](https://github.com/UX-Decoder/Semantic-SAM), - | 188 | | 2023 | NeurIps | [Segment Everything Everywhere All at Once](https://proceedings.neurips.cc/paper_files/paper/2023/hash/3ef61f7e4afacf9a2c5b71c726172b86-Abstract-Conference.html) | [Code](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once), - | 189 | | 2014 | ECCV | [Microsoft coco: Common objects in context](https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48) | -, [Project](https://cocodataset.org/#home) | 190 | | 2017 | CVPR | [Scene Parsing Through ADE20K Dataset](https://openaccess.thecvf.com/content_cvpr_2017/html/Zhou_Scene_Parsing_Through_CVPR_2017_paper.html) | -, - | 191 | | 2015 | IJCV | [The pascal visual object classes challenge: A retrospective](https://link.springer.com/article/10.1007/s11263-014-0733-5) | -, - | 192 | --------------------------------------------------------------------------------