└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Awesome-Open-Vocabulary-Perception [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 2 | 3 | Papers and codes for open-vocabulary perception (3D&2D). 😎 4 | 5 | This repo mainly focuses on the open-vocabulary perception tasks (both 3D and 2D). Please pull requests or email me by `yangcao.cs@gmail.com` if you want to recommend papers. 6 | 7 | ## 3D 8 | 9 | 10 | ### Open-Vocabulary 3D Object Detection 11 | 1. [CoDAv2] [Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection](https://arxiv.org/pdf/2406.00830), `TPAMI2025`. [[Code](https://github.com/yangcaoai/CoDA_NeurIPS2023)] 12 | 2. [ImOV3D] [ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images](https://arxiv.org/pdf/2410.24001), `NeurIPS2024`. [[Code](https://github.com/yangtiming/ImOV3D)] 13 | 3. [INHA] [Unlocking textual and visual wisdom: Open-vocabulary 3d object detection enhanced by comprehensive guidance from text and image](https://arxiv.org/pdf/2407.05256), `ECCV2024`. 14 | 4. [GLIS] [Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection](https://arxiv.org/pdf/2407.08931), `ECCV2024`. [[Code](https://github.com/GradiusTwinbee/GLIS)] 15 | 5. [CoDA] [Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection](https://openreview.net/pdf?id=QW5ouyyIgG), `NeurIPS2023`. [[Code](https://github.com/yangcaoai/CoDA_NeurIPS2023)] 16 | 6. [OV-3DET] [Open-Vocabulary Point-Cloud Object Detection without 3D Annotation](https://openaccess.thecvf.com/content/CVPR2023/papers/Lu_Open-Vocabulary_Point-Cloud_Object_Detection_Without_3D_Annotation_CVPR_2023_paper.pdf), `CVPR2023`. [[Code](https://github.com/lyhdet/OV-3DET)] 17 | 7. [FM-OV3D] [FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection](https://arxiv.org/pdf/2312.14465.pdf), `AAAI2024`. [[Code](https://github.com/dmzhang0425/FM-OV3D)] 18 | 19 | 20 | 21 | 22 | ### Open-Vocabulary 3D Segmentation 23 | 1. [OpenMask3D] [OpenMask3D: Open-Vocabulary 3D Instance Segmentation](https://openmask3d.github.io/static/pdf/openmask3d.pdf), `NeurIPS2023`. [[Code](https://github.com/OpenMask3D/openmask3d)] 24 | 2. [OpenScene] [OpenScene: 3D Scene Understanding with Open Vocabularies](https://arxiv.org/pdf/2211.15654), `CVPR2023`. [[Code](https://github.com/pengsongyou/openscene)] 25 | 3. [3D-OVS] [Weakly Supervised 3D Open-vocabulary Segmentation](https://arxiv.org/pdf/2305.14093), `CVPR2023`. [[Code](https://github.com/Kunhao-Liu/3D-OVS)] 26 | 4. [PLA] [PLA: Language-Driven Open-Vocabulary 3D Scene Understanding](https://arxiv.org/pdf/2211.16312.pdf), `CVPR2023`. [[Code](https://github.com/CVMI-Lab/PLA)] 27 | 5. [Open3DIS] [Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance](https://arxiv.org/abs/2312.10671), `CVPR2024`. [[Code](https://open3dis.github.io/)] 28 | 6. [MaskClustering] [MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation](https://arxiv.org/pdf/2401.07745), `CVPR2024`. [[Code](https://github.com/PKU-EPIC/MaskClustering) 29 | 7. [LEGaussians] [LEGaussians: Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding](https://arxiv.org/pdf/2311.18482.pdf), `CVPR2024`. [[Code](https://github.com/buaavrcg/LEGaussians) 30 | 31 | 32 | ## 2D 33 | 34 | ### Open-Vocabulary 2D Object Detection 35 | 1. [Detclip] [Dictionary-enriched visual-concept paralleled pre-training for open-world detection](https://proceedings.neurips.cc/paper_files/paper/2022/file/3ba960559212691be13fa81d9e5e0047-Paper-Conference.pdf), `NeurIPS2023` 36 | 2. [Detclipv2] [Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment](http://openaccess.thecvf.com/content/CVPR2023/papers/Yao_DetCLIPv2_Scalable_Open-Vocabulary_Object_Detection_Pre-Training_via_Word-Region_Alignment_CVPR_2023_paper.pdf), `CVPR2023` 37 | 3. [Detclipv3] DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection, `CVPR2024` 38 | 4. [YOLO-World] [YOLO-World: Real-Time Open-Vocabulary Object Detection](https://arxiv.org/pdf/2401.17270v3), `CVPR2024`. [[Code](https://github.com/AILab-CVC/YOLO-World)] 39 | 5. [NOD] [Enhancing Novel Object Detection via Cooperative Foundational Models](https://arxiv.org/abs/2311.12068), `WACV2025`. [[Code](https://github.com/rohit901/cooperative-foundational-models)] 40 | 41 | ### Open-Vocabulary 2D Segmentation 42 | 1. [ODISE] [Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models](https://arxiv.org/pdf/2303.04803), `CVPR2023 Highlight`. [[Code](https://github.com/NVlabs/ODISE)] 43 | 2. [FreeDA] [Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation](https://aimagelab.github.io/freeda/), `CVPR2024`. [[Code](https://aimagelab.github.io/freeda/)] 44 | 3. [OVAM] [Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models](https://arxiv.org/pdf/2403.14291), `CVPR2024`. [[Code](https://github.com/vpulab/ovam)] 45 | 4. [PnP-OVSS] [Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models](https://arxiv.org/abs/2311.17095), `CVPR2024`. [[Code](https://github.com/vpulab/ovam)] 46 | 5. [OVFoodSeg] [OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation](https://arxiv.org/pdf/2404.01409), `CVPR2024`. 47 | 6. [SED] [SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation](https://arxiv.org/pdf/2311.15537), `CVPR2024`. 48 | --------------------------------------------------------------------------------