└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Awesome-Open-Vocabulary-Perception  [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
 2 | 
 3 | Papers and codes for open-vocabulary perception (3D&2D). 😎
 4 | 
 5 | This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by `yangcao.cs@gmail.com` if you want to recommend papers.   
 6 | 
 7 | ## 3D
 8 | 
 9 | 
10 | ### Open-Vocabulary 3D Object Detection
11 | 1. <span id = "16001">[CoDAv2] [Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection](https://arxiv.org/pdf/2406.00830), `TPAMI2025`. [[Code](https://github.com/yangcaoai/CoDA_NeurIPS2023)]
12 | 2. <span id = "16001">[ImOV3D] [ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images](https://arxiv.org/pdf/2410.24001), `NeurIPS2024`. [[Code](https://github.com/yangtiming/ImOV3D)]
13 | 3. <span id = "16001">[INHA] [Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image](https://arxiv.org/pdf/2407.05256), `ECCV2024`.
14 | 4. <span id = "16001">[GLIS] [Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection](https://arxiv.org/pdf/2407.08931), `ECCV2024`. [[Code](https://github.com/GradiusTwinbee/GLIS)]
15 | 5. <span id = "16001">[CoDA] [Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection](https://openreview.net/pdf?id=QW5ouyyIgG), `NeurIPS2023`. [[Code](https://github.com/yangcaoai/CoDA_NeurIPS2023)]
16 | 6. <span id = "18001">[OV-3DET] [Open-Vocabulary Point-Cloud Object Detection without 3D Annotation](https://openaccess.thecvf.com/content/CVPR2023/papers/Lu_Open-Vocabulary_Point-Cloud_Object_Detection_Without_3D_Annotation_CVPR_2023_paper.pdf), `CVPR2023`. [[Code](https://github.com/lyhdet/OV-3DET)]
17 | 7. <span id = "16001">[FM-OV3D] [FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection](https://arxiv.org/pdf/2312.14465.pdf), `AAAI2024`. [[Code](https://github.com/dmzhang0425/FM-OV3D)]
18 | 
19 | 
20 | 
21 | 
22 | ### Open-Vocabulary 3D Segmentation
23 | 1. <span id = "16001">[OpenMask3D] [OpenMask3D: Open-Vocabulary 3D Instance Segmentation](https://openmask3d.github.io/static/pdf/openmask3d.pdf), `NeurIPS2023`. [[Code](https://github.com/OpenMask3D/openmask3d)]
24 | 2. <span id = "16001">[OpenScene] [OpenScene: 3D Scene Understanding with Open Vocabularies](https://arxiv.org/pdf/2211.15654), `CVPR2023`. [[Code](https://github.com/pengsongyou/openscene)]
25 | 3. <span id = "16001">[3D-OVS] [Weakly Supervised 3D Open-vocabulary Segmentation](https://arxiv.org/pdf/2305.14093), `CVPR2023`. [[Code](https://github.com/Kunhao-Liu/3D-OVS)]
26 | 4. <span id = "16001">[PLA] [PLA: Language-Driven Open-Vocabulary 3D Scene Understanding](https://arxiv.org/pdf/2211.16312.pdf), `CVPR2023`. [[Code](https://github.com/CVMI-Lab/PLA)]
27 | 5. <span id = "16001">[Open3DIS] [Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance](https://arxiv.org/abs/2312.10671), `CVPR2024`. [[Code](https://open3dis.github.io/)]
28 | 6. <span id = "16001">[MaskClustering] [MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation](https://arxiv.org/pdf/2401.07745), `CVPR2024`. [[Code](https://github.com/PKU-EPIC/MaskClustering)
29 | 7. <span id = "16001">[LEGaussians] [LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding](https://arxiv.org/pdf/2311.18482.pdf), `CVPR2024`. [[Code](https://github.com/buaavrcg/LEGaussians)
30 | 
31 | 
32 | ## 2D
33 | 
34 | ### Open-Vocabulary 2D Object Detection  
35 | 1. <span id = "16001">[Detclip] [Dictionary-enriched visual-concept paralleled pre-training for open-world detection](https://proceedings.neurips.cc/paper_files/paper/2022/file/3ba960559212691be13fa81d9e5e0047-Paper-Conference.pdf), `NeurIPS2023`
36 | 2. <span id = "16001">[Detclipv2] [Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment](http://openaccess.thecvf.com/content/CVPR2023/papers/Yao_DetCLIPv2_Scalable_Open-Vocabulary_Object_Detection_Pre-Training_via_Word-Region_Alignment_CVPR_2023_paper.pdf), `CVPR2023`
37 | 3. <span id = "16001">[Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection, `CVPR2024`
38 | 4. <span id = "16001">[YOLO-World] [YOLO-World: Real-Time Open-Vocabulary Object Detection](https://arxiv.org/pdf/2401.17270v3), `CVPR2024`. [[Code](https://github.com/AILab-CVC/YOLO-World)]
39 | 5. <span id = "16001">[NOD] [Enhancing Novel Object Detection via Cooperative Foundational Models](https://arxiv.org/abs/2311.12068), `WACV2025`. [[Code](https://github.com/rohit901/cooperative-foundational-models)]
40 | 
41 | ### Open-Vocabulary 2D Segmentation  
42 | 1. <span id = "16001">[ODISE] [Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models](https://arxiv.org/pdf/2303.04803), `CVPR2023 Highlight`. [[Code](https://github.com/NVlabs/ODISE)]
43 | 2. <span id = "16001">[FreeDA] [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](https://aimagelab.github.io/freeda/), `CVPR2024`. [[Code](https://aimagelab.github.io/freeda/)]
44 | 3. <span id = "16001">[OVAM] [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://arxiv.org/pdf/2403.14291), `CVPR2024`. [[Code](https://github.com/vpulab/ovam)]
45 | 4. <span id = "16001">[PnP-OVSS] [Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models](https://arxiv.org/abs/2311.17095), `CVPR2024`. [[Code](https://github.com/vpulab/ovam)]
46 | 5. <span id = "16001">[OVFoodSeg] [OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation](https://arxiv.org/pdf/2404.01409), `CVPR2024`. 
47 | 6. <span id = "16001">[SED] [SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2311.15537), `CVPR2024`.
48 | 


--------------------------------------------------------------------------------