├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 vasgaowei 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Bird's Eye View Perception 2 | This is a repository for Bird's Eye View Perception, including 3D object detection, segmentation, online-mapping and occupancy prediction. 3 | 4 | ## News 5 | ``` 6 | - 2023.05.09: An initial version of recent papers or projects. 7 | - 2023.05.12: Adding paper for 3D object detection. 8 | - 2023.05.14: Adding paper for BEV segmentation, HD-map construction, Occupancy prediction and motion planning. 9 | ``` 10 | 11 | ## Contents 12 | 13 | ## Papers 14 | - [Survey](#survey) 15 | - [3D Object Detection](#3d-object-detection) 16 | - [Radar Lidar](#radar-lidar) 17 | - [Radar Camera](#radar-camera) 18 | - [Lidar Camera](#lidar-camera) 19 | - [Lidar](#lidar) 20 | - [Monocular](#monocular) 21 | - [Multiple Camera](#multiple-camera) 22 | - [BEV Segmentation](#bev-segmentation) 23 | - [Lidar Camera](#lidar-camera) 24 | - [Lidar](#lidar) 25 | - [Monocular](#monocular) 26 | - [Multiple Camera](#multiple-camera) 27 | - [Tracking](#tracking) 28 | - [Perception Prediction Planning](#perception-prediction-planning) 29 | - [Monocular](#monocular) 30 | - [Multiple Camera](#multiple-camera) 31 | - [Mapping](#mapping) 32 | - [Lidar](#lidar) 33 | - [Lidar Camera](#lidar-camera) 34 | - [Monocular](#monocular) 35 | - [Multiple Camera](#multiple-camera) 36 | - [LaneGraph](#lanegraph) 37 | - [Monocular](#monocular) 38 | - [Locate](#locate) 39 | - [Occupancy Prediction](#occupancy-prediction) 40 | - [Occupancy Challenge](#occupancy-challenge) 41 | - [Challenge](#challenge) 42 | - [Dataset](#dataset) 43 | - [World Model](#world-model) 44 | - [Other](#other) 45 | 46 | ### Survey 47 | - Vision-Centric BEV Perception: A Survey (Arxiv 2022)[[Paper]](https://arxiv.org/abs/2208.02797) [[Github]](https://github.com/4DVLab/Vision-Centric-BEV-Perception) 48 | - Delving into the Devils of Bird’s-eye-viewPerception: A Review, Evaluation and Recipe (Arxiv 2022) [[Paper]](https://arxiv.org/abs/2209.05324) [[Github]](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) 49 | ### 3D Object Detection 50 | #### Radar Lidar 51 | - RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection System (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2211.06108.pdf) 52 | - Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D DynamicObject Detection (CVPR 2023) [[paper]](https://arxiv.org/pdf/2306.01438.pdf) [[Github]](https://github.com/JessieW0806/Bi-LRFusion) 53 | - MaskBEV: Joint Object Detection and Footprint Completion for Bird’s-eye View 3D Point Clouds (IORS 2023) [[Paper]](https://arxiv.org/pdf/2307.01864.pdf) [[Github]](https://github.com/norlab-ulaval/mask_bev) 54 | - LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.00724.pdf) 55 | - HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.11489) [[Github]](https://github.com/garfield-cpp/HGSFusion) 56 | #### Radar Camera 57 | - CRAFT: Camera-Radar 3D Object Detectionwith Spatio-Contextual Fusion Transformer (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2209.06535.pdf) 58 | - RadSegNet: A Reliable Approach to Radar Camera Fusion (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2208.03849.pdf) 59 | - Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection (IEEE TIV 2023) [[Paper]](https://arxiv.org/pdf/2208.12079.pdf) 60 | - CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception (ICLRW 2023) [[Paper]](https://arxiv.org/pdf/2304.00670.pdf) 61 | - RC-BEVFusion: A Plug-In Module for Radar-CameraBird’s Eye View Feature Fusion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.15883.pdf) 62 | - RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.16440) [[Github]](https://github.com/VDIGPKU/RCBEVDet) 63 | - UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection (Arxiv 2024) [[paper]](https://arxiv.org/abs/2409.14751) 64 | - SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2411.19860) [[Github]](https://github.com/phi-wol/sparc) 65 | - RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection (AAAI 2025 2024) [[Paper]](https://arxiv.org/pdf/2412.12799) [[Github]](https://github.com/liyih/RCTrans) 66 | #### Lidar Camera 67 | - Semantic bevfusion: rethink lidar-camera fusion in unified bird’s-eye view representation for 3d object detection (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2212.04675.pdf) 68 | - Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.04179.pdf) 69 | - EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.17895.pdf) [[Github]](https://github.com/hht1996ok/EA-BEV) 70 | - MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection (CVPR 2023) [[paper]](https://arxiv.org/pdf/2209.03102.pdf) [[Github]](https://github.com/SxJyJay/MSMDFusion) 71 | - FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.16617.pdf) 72 | - Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.07152.pdf) 73 | - SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2309.07084.pdf) [[Github]](https://github.com/IranQin/SupFusion) 74 | - 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.03742.pdf) 75 | - FUSIONVIT: HIERARCHICAL 3D OBJECT DETECTION VIA LIDAR-CAMERA VISION TRANSFORMER FUSION (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.03620.pdf) 76 | - Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.14919.pdf) 77 | - PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.02811) 78 | - Learned Multimodal Compression for Autonomous Driving (IEEE MMSP 2024) [[Paper]](https://arxiv.org/pdf/2408.08211) 79 | - Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2408.07999) 80 | - SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.05292) 81 | - Timealign: A Multi-Modal Object Detection Method For Time Misalignment Fusing In Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.10033) 82 | - Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection (ICRA 2025) [[Paper]](https://arxiv.org/abs/2503.10579) 83 | #### Lidar 84 | - MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection (AAAI 2023)[[paper]](https://arxiv.org/pdf/2212.00442.pdf)[[Github]](https://github.com/HYjhkoh/MGTANet) 85 | - PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.03982.pdf) 86 | - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.04409.pdf) 87 | - SEED: A Simple and Effective 3D DETR in Point Clouds (ECCV 2024) [[Paper]](https://arxiv.org/pdf/2407.10749) [[Github]](https://github.com/happinesslz/SEED) 88 | #### Monocular 89 | - Learning 2D to 3D Lifting for Object Detection in 3Dfor Autonomous Vehicles (IROS 2019) [[Paper]](https://arxiv.org/abs/1904.08494) [[Project Page](https://www.nec-labs.com/research/media-analytics/projects/learning-2d-to-3d-lifting-for-object-detection-in-3d-for-autonomous-vehicles/) 90 | - Orthographic Feature Transform for Monocular 3D Object Detection (BMVC 2019) [[Paper]](http://mi.eng.cam.ac.uk/~cipolla/publications/inproceedings/2019-BMVC-Orthographic-Feature-Transform.pdf) [[Github]](https://github.com/tom-roddick/oft) 91 | - BEV-MODNet: Monocular Camera-based Bird's Eye View Moving Object Detection for Autonomous Driving (ITSC 2021) [[Paper]](https://arxiv.org/abs/2107.04937) [[Project Page]](https://sites.google.com/view/bev-modnet) 92 | - Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021) [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Reading_Categorical_Depth_Distribution_Network_for_Monocular_3D_Object_Detection_CVPR_2021_paper.pdf) [[Github]](https://github.com/TRAILab/CaDDN) 93 | - PersDet: Monocular 3D Detection in Perspective Bird’s-Eye-View (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2208.09394.pdf) 94 | - Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2205.14882.pdf) 95 | - Monocular 3D Object Detection with Depth from Motion (ECCV 2022) [[paper]](https://arxiv.org/pdf/2207.12988.pdf)[[Github]](https://github.com/Tai-Wang/Depth-from-Motion) 96 | - MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.09421v1.pdf) [[Github]](https://github.com/cskkxjk/MonoNeRD) 97 | - S3-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.00928.pdf) [[Github]](https://github.com/mikasa3lili/S3-MonoDETR) 98 | - MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.00400.pdf) 99 | - YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.17379.pdf) 100 | - UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [[Paper]](https://arxiv.org/abs/2402.18573v1) 101 | - Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving (Arxuv 2024) [[paper]](https://arxiv.org/abs/2403.02037) [[Github]](https://github.com/Owen-Liuyuxuan/visionfactory) 102 | - UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [[Paper]](https://arxiv.org/abs/2402.18573) 103 | - MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.15176) 104 | - MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.19590) 105 | #### Multiple Camera 106 | - Object DGCNN: 3D Object Detection using Dynamic Graphs (NIPS 2021) [[Paper]](https://arxiv.org/pdf/2110.06923.pdf)[[Github]](https://github.com/WangYueFt/detr3d) 107 | - BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View (Arxiv 2022) [[Paper]](https://arxiv.org/abs/2112.11790) [[Github]](https://github.com/HuangJunJie2017/BEVDet) 108 | - DETR3D:3D Object Detection from Multi-view Image via 3D-to-2D Queries (CORL 2021) [[Paper]](https://arxiv.org/abs/2110.06922) [[Github]](https://github.com/WangYueFt/detr3d) 109 | - BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework (NeurIPS 2022) [[Paper]](https://arxiv.org/abs/2205.13790)[[Github]](https://github.com/ADLab-AutoDrive/BEVFusion) 110 | - Unifying Voxel-based Representation withTransformer for 3D Object Detectio (NeurIPS 2022) [[paper]](https://arxiv.org/pdf/2206.00630.pdf)[[Github]](https://github.com/dvlab-research/UVTR) 111 | - Polar Parametrization for Vision-based Surround-View 3D Detection (arxiv 2022) [[Paper]](https://arxiv.org/abs/2206.10965) [[Github]](https://github.com/hustvl/PolarDETR) 112 | - SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving (Arxiv 2022) [[Paper]](https://arxiv.org/abs/2206.14451) [[Github]](https://github.com/synsin0/SRCN3D) 113 | - BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxuv 2022) [[Paper]](https://arxiv.org/pdf/2203.17054.pdf) [[Github]](https://github.com/HuangJunJie2017/BEVDet) 114 | - BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stere (Arxiv 2022) [[Paper]](https://arxiv.org/abs/2209.10248)[[Github]](https://github.com/Megvii-BaseDetection/BEVStereo) 115 | - MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2207.12716.pdf) [[Github]](https://github.com/Tai-Wang/Depth-from-Motion) 116 | - Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object (Arxiv 2022)[[Paper]](https://arxiv.org/pdf/2212.05505.pdf) 117 | - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2212.07849.pdf) 118 | - Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2210.17252.pdf) 119 | - SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detectio (IROS 2023) [[Paper]](https://arxiv.org/pdf/2212.04675.pdf) 120 | - BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2212.04675.pdf) 121 | - STS: Surround-view Temporal Stereo for Multi-view 3D Detection (Arxiv 2022) [[Paper]](https://arxiv.org/abs/2208.10145) 122 | - BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2212.00623.pdf) 123 | - Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2210.17252.pdf) 124 | - AutoAlign: Pixel-Instance Feature Aggregationfor Multi-Modal 3D Object Detection (IJCAI 2022) [[Paper]](https://www.ijcai.org/proceedings/2022/0116.pdf) 125 | - Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection (ACM MM 2022) [[paper]](https://github.com/zehuichen123/Graph-DETR3D)[[Github]](https://github.com/zehuichen123/Graph-DETR3D) 126 | - ORA3D: Overlap Region Aware Multi-view 3D Object Detection (BMVC 2022) [[Paper]](https://arxiv.org/pdf/2207.00865.pdf) [[Project Page]](https://kuai-lab.github.io/bmvc2022ora3d/) 127 | - AutoAlignV2: Deformable Feature Aggregation for DynamicMulti-Modal 3D Object Detection (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136680616.pdf)[[Github]](https://github.com/zehuichen123/AutoAlignV2) 128 | - CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022) [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136980487.pdf)[[Github]](https://github.com/TuSimple/centerformer) 129 | - SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022) [[Paper]]([[https://markus-enzweiler.de/downloads/publications/ECCV2022-spatial_detr.pdf](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136990226.pdf)](https://arxiv.org/pdf/2211.14710.pdf))[[Github]](https://github.com/cgtuebingen/SpatialDETR) 130 | - Position Embedding Transformation for Multi-View 3D Object Detection (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136870523.pdf) [[Github]](https://github.com/megvii-research/PETR) 131 | - BEVDepth: Acquisition of Reliable Depth forMulti-view 3D Object Detection (AAAI 2023) [[Paper]](https://arxiv.org/abs/2206.10092) [[Github]](https://github.com/Megvii-BaseDetection/BEVDepth) 132 | - PolarFormer: Multi-camera 3D Object Detectionwith Polar Transformers (AAAI 2023) [[Paper]](https://arxiv.org/abs/2206.15398)[[Github]](https://github.com/fudan-zvg/PolarFormer) 133 | - A Simple Baseline for Multi-Camera 3D Object Detection (AAAI 2023) [[Paper]](https://arxiv.org/pdf/2208.10035.pdf)[[Github]](https://github.com/zhangyp15/SimMOD) 134 | - Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.01283.pdf) [[Github]](https://github.com/junjie18/CMT) 135 | - Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2211.10581.pdf) [[Github]](https://github.com/linxuewu/Sparse4D) 136 | - BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.16818.pdf)[[Github]](https://github.com/ViTAE-Transformer/BEVSimDet) 137 | - BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.04185.pdf) 138 | - BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.02000.pdf) [[Github]](https://github.com/mystorm16/BSH-Det3D) 139 | - DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.16628.pdf) [[Github]](https://github.com/SmartBot-PJLab/DORT) 140 | - Geometric-aware Pretraining for Vision-centric 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.03105.pdf) [[Github]](https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe) 141 | - Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.05970.pdf) 142 | - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.05711.pdf) 143 | - Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2304.00967.pdf) [[Github]](https://github.com/sense-x/hop) 144 | - VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.10975.pdf) 145 | - Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.02364.pdf) 146 | - VoxelFormer: Bird’s-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection (Arxiv 2023) [[Paper]](https://github.com/Lizhuoling/VoxelFormer-public) [[Github]](https://arxiv.org/pdf/2304.01054.pdf) 147 | - TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2212.13979.pdf) [[Github]](https://github.com/ADLab3Ds/TiG-BEV) 148 | - CrossDTR: Cross-view and Depth-guided Transformersfor 3D Object Detection (ICRA 2023) [[Paper]](https://arxiv.org/pdf/2209.13507.pdf)[[Github]](https://github.com/sty61010/CrossDTR) 149 | - SOLOFusion: Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection (ICLR 2023) [[paper]](https://arxiv.org/abs/2210.02443)[[Github]](https://github.com/Divadi/SOLOFusion) 150 | - BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection (ICLR 2023) [[Paper]](https://openreview.net/pdf?id=-2zfgNS917)[[Github]](https://github.com/zehuichen123/BEVDistill) 151 | - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View (CVPR 2023)[[Paper]](https://arxiv.org/pdf/2303.15083.pdf)[[Github]](https://github.com/megvii-research/CVPR2023-UniDistill) 152 | - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2303.17297.pdf) 153 | - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2303.06880.pdf) [[Github]](https://github.com/PJLab-ADG/3DTrans) 154 | - Aedet: Azimuth-invariant multi-view 3d object detection (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2211.12501.pdf) [[Github]](https://github.com/fcjian/AeDet) [[Project]](https://fcjian.github.io/aedet/) 155 | - BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2303.08498.pdf) [[Github]](https://github.com/ADLab-AutoDrive/BEVHeight) 156 | - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2303.10209.pdf) [[Github]](https://github.com/kaixinbear/CAPE) 157 | - FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2301.04467.pdf) [[Github]](https://github.com/robertwyq/frustum) 158 | - Sparse4D v2 Recurrent Temporal Fusion with Sparse Model (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.14018.pdf) [[Github]](https://github.com/linxuewu/Sparse4D) 159 | - DA-BEV : Depth Aware BEV Transformer for 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2302.13002.pdf) 160 | - BEV-IO: Enhancing Bird’s-Eye-View 3D Detectionwith Instance Occupancy (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.16829.pdf) 161 | - OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection (Arxiv) [[Paper]](https://arxiv.org/pdf/2306.01738.pdf) 162 | - SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2307.11477.pdf) [[Github]](https://github.com/mengtan00/SA-BEV) 163 | - Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.08528.pdf) 164 | - DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.12972.pdf) 165 | - Far3D: Expanding the Horizon for Surround-view 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.09616.pdf) 166 | - HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird’s Eye View (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2307.13510.pdf) 167 | - Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2303.11926.pdf) [[Github]](https://github.com/exiawsh/StreamPETR) 168 | - 3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2211.14710.pdf) [[Github]](https://github.com/drilistbox/3DPPE) [[Github]](https://github.com/FiveLu/stream3dppe) 169 | - FB-BEV: BEV Representation from Forward-Backward View Transformations (ICCV 2023) [[paper]](https://arxiv.org/pdf/2308.02236.pdf) [[Github]](https://github.com/NVlabs/FB-BEV) 170 | - QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.10515v1.pdf) 171 | - SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.09244v1.pdf) [[Github]](https://github.com/MCG-NJU/SparseBEV) 172 | - NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection (ICCV 2023) [[paper]](https://arxiv.org/pdf/2307.14620.pdf) [[Github]](https://github.com/facebookresearch/NeRF-Det) 173 | - DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023) [[paper]](https://openaccess.thecvf.com/content/ICCV2023/html/Wang_DistillBEV_Boosting_Multi-Camera_3D_Object_Detection_with_Cross-Modal_Knowledge_Distillation_ICCV_2023_paper.html) 174 | - BEVHeight++: Toward Robust Visual Centric 3D Object Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.16179.pdf) 175 | - UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.14516.pdf) 176 | - Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.14491.pdf) 177 | - Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection (ICCV 2023) [[Paper]](https://browse.arxiv.org/pdf/2310.01401.pdf) [[Github]](https://github.com/ymingxie/parq) [[Project]](https://ymingxie.github.io/parq/) 178 | - CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.06008.pdf) 179 | - DynamicBEV: Leveraging Dynamic Queries and Temporal Context for 3D Object Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.05989.pdf) 180 | - TOWARDS GENERALIZABLE MULTI-CAMERA 3D OBJECT DETECTION VIA PERSPECTIVE DEBIASING (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.11346.pdf) 181 | - Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection (NeurIPS 2023) (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.15670.pdf) [[Github]](https://github.com/OpenDriveLab/Birds-eye-view-Perception) 182 | - M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D Object (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.00986.pdf) 183 | - Sparse4D v3 Advancing End-to-End 3D Detection and Tracking (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.11722.pdf) [[Github]](https://github.com/linxuewu/Sparse4D) 184 | - BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.01696.pdf) 185 | - Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [[Paper]](https://arxiv.org/pdf/2312.00633.pdf) 186 | - Residual Graph Convolutional Network for Bird”s-Eye-View Semantic Segmentation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.04044.pdf) 187 | - Diffusion-Based Particle-DETR for BEV Perception (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.11578.pdf) 188 | - M-BEV: Masked BEV Perception for Robust Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.12144.pdf) 189 | - Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.14606.pdf) 190 | - Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.04179.pdf) 191 | - WidthFormer: Toward Efficient Transformer-based BEV View Transformation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2401.03836.pdf) [[Github]](https://github.com/ChenhongyiYang/WidthFormer) 192 | - UniVision: A Unified Framework for Vision-Centric 3D Perception (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.06994.pdf) 193 | - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.08687.pdf) 194 | - Towards Scenario Generalization for Vision-based Roadside 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.16110.pdf) [[Github]]() 195 | - CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.08919) 196 | - GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.11848) 197 | - Lifting Multi-View Detection and Tracking to the Bird's Eye View (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.12573) [[Github]](https://github.com/tteepe/TrackTacular) 198 | - DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.10577) 199 | - BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection (CVPR 2024) [[Paper]]() [[Github]](https://github.com/DaTongjie/BEVSpread) 200 | - OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection (ECCV 2024) [[Paper]](https://arxiv.org/pdf/2407.10753) [[Github]](https://github.com/AlmoonYsl/OPEN) 201 | - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection (ECCV 2024) [[Paper]](https://arxiv.org/pdf/2407.10135) 202 | - PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.16200) 203 | - GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.01816) 204 | - Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression (ECCV 2024) [[Paper]](https://arxiv.org/abs/2409.00633) [[Github]](https://github.com/DYZhang09/ToC3D) 205 | - MambaBEV: An efficient 3D detection model with Mamba2 (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.12673) 206 | - ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.10298) 207 | - Test-time Correction with Human Feedback: An Online 3D Detection System via Visual Prompting (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2412.07768) 208 | - HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.18884) 209 | - TiGDistill-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning Distillation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.20911) [[Github]](https://github.com/Public-BOTs/TiGDistill-BEV) 210 | - DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.11122) 211 | ### BEV Segmentation 212 | #### Lidar Camera 213 | - PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Axxiv 2023) [[Paper]](https://arxiv.org/pdf/2206.01256.pdf) [[Github]](https://github.com/megvii-research/PETR) 214 | - X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation (WACV 2023) [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Borse_X-Align_Cross-Modal_Cross-View_Alignment_for_Birds-Eye-View_Segmentation_WACV_2023_paper.pdf) 215 | - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation (ICRA 2023) [[Paper]](https://arxiv.org/pdf/2205.13542.pdf) [[Github]](https://github.com/mit-han-lab/bevfusion) [[Project]](https://bevfusion.mit.edu/) 216 | UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.10421.pdf) 217 | - BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.17099.pdf) 218 | - Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.11325.pdf) 219 | - LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2304.11379.pdf) [[Github]](https://github.com/songw-zju/LiDAR2Map) 220 | - BEV-Guided Multi-Modality Fusion for Driving Perception (CVPR 2023) [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Man_BEV-Guided_Multi-Modality_Fusion_for_Driving_Perception_CVPR_2023_paper.pdf) [[Github]](https://yunzeman.github.io/BEVGuide) 221 | - FUSIONFORMER: A MULTI-SENSORY FUSION IN BIRD’S-EYE-VIEW AND TEMPORAL CONSISTENT TRANSFORMER FOR 3D OBJECTION (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.05257v1.pdf) 222 | - UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.07732.pdf) [[Github]](https://github.com/Haiyang-W/UniTR) 223 | - BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird’s Eye View Map Construction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.11119.pdf) 224 | - BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.11761) 225 | - OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.13137) 226 | - BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment (IROS 2024) [[Paper]](https://arxiv.org/pdf/2410.20969) [[Project]](https://m80hz.github.io/bevpose/) 227 | - PC-BEV: An Efficient Polar-Cartesian BEV Fusion Framework for LiDAR Semantic Segmentation (AAAI 2025) [[paper]](https://arxiv.org/abs/2412.14821) [[Paper]](https://github.com/skyshoumeng/PC-BEV) 228 | #### Lidar 229 | - LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2206.11428.pdf) 230 | - SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.13323.pdf) 231 | - BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds (3DV 2023) [[Paper]](https://arxiv.org/pdf/2310.17281.pdf) [[Github]](https://github.com/valeoai/BEVContrast) 232 | #### Monocular 233 | - Learning to Look around Objects for Top-View Representations of Outdoor Scenes (ECCV 2018) [[paper]](https://arxiv.org/pdf/1803.10870.pdf) 234 | - A Parametric Top-View Representation of Complex Road Scenes (CVPR 2019) [[Paper]](https://arxiv.org/pdf/1812.06152.pdf) 235 | - Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks (ICRA 2019 IEEE RA-L 2019) [[Paper]](https://arxiv.org/pdf/1804.02176.pdf) [[Github]](https://github.com/Chenyang-Lu/mono-semantic-occupancy) 236 | - Short-Term Prediction and Multi-Camera Fusion on Semantic Grids (ICCVW 2019) [[paper]](https://openaccess.thecvf.com/content_ICCVW_2019/papers/CVRSUAD/Hoyer_Short-Term_Prediction_and_Multi-Camera_Fusion_on_Semantic_Grids_ICCVW_2019_paper.pdf) 237 | - Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks (CVPR 2020) [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Roddick_Predicting_Semantic_Map_Representations_From_Images_Using_Pyramid_Occupancy_Networks_CVPR_2020_paper.pdf) [[Github]](https://github.com/tom-roddick/mono-semantic-maps) 238 | - MonoLayout : Amodal scene layout from a single image (WACV 2020) [[Paper]](https://openaccess.thecvf.com/content_WACV_2020/papers/Mani_MonoLayout_Amodal_scene_layout_from_a_single_image_WACV_2020_paper.pdf) [[Github]](https://github.com/manila95/monolayout) 239 | - Bird’s Eye View Segmentation Using Lifted2D Semantic Features (BMVC 2021) [[Paper]](https://www.bmvc2021-virtualconference.com/assets/papers/0772.pdf) 240 | - Enabling Spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation (ICRA 2021) [[Paper]](https://cvssp.org/Personal/OscarMendez/papers/pdf/SahaICRA2021.pdf) [[mp4]](https://cvssp.org/Personal/OscarMendez/videos/SahaICRA2021.mp4) 241 | - Projecting Your View Attentively: Monocular Road Scene Layout Estimation viaCross-view Transformation (CVPR 2021) [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Yang_Projecting_Your_View_Attentively_Monocular_Road_Scene_Layout_Estimation_via_CVPR_2021_paper.pdf) [[Github]](https://github.com/JonDoe-297/cross-view) 242 | - ViT BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation (IEEE IJCNN 2022) [[paper]](https://arxiv.org/pdf/2205.15667.pdf) 243 | - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images (IEEE RA-L 2022) [[Paper]](https://arxiv.org/pdf/2108.03227.pdf) [[Github]](https://github.com/robot-learning-freiburg/PanopticBEV) [[Project]](http://panoptic-bev.cs.uni-freiburg.de/) 244 | - Understanding Bird's-Eye View of Road Semantics using an Onboard Camera (ICRA 2022) [[Paper]](https://arxiv.org/pdf/2012.03040.pdf) [[Github]](https://github.com/ybarancan/BEV_feat_stitch) 245 | - “The Pedestrian next to the Lamppost”Adaptive Object Graphs for Better Instantaneous Mapping (CVPR 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Saha_The_Pedestrian_Next_to_the_Lamppost_Adaptive_Object_Graphs_for_CVPR_2022_paper.pdf) 246 | - Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts (CVPR 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Weakly_but_Deeply_Supervised_Occlusion-Reasoned_Parametric_Road_Layouts_CVPR_2022_paper.pdf) 247 | - Translating Images into Maps (ICRA 2022) [[Paper]](https://arxiv.org/pdf/2110.00966.pdf) [[Github]](https://github.com/avishkarsaha/translating-images-into-maps) 248 | - GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136610390.pdf) 249 | - SBEVNet: End-to-End Deep Stereo Layout Estimation (WACV 2022) [[Paper]](https://openaccess.thecvf.com/content/WACV2022/papers/Gupta_SBEVNet_End-to-End_Deep_Stereo_Layout_Estimation_WACV_2022_paper.pdf) 250 | - BEVSegFormer: Bird’s Eye View Semantic Segmentation From ArbitraryCamera Rigs (WACV 2023) [[Paper]](https://openaccess.thecvf.com/content/WACV2023/papers/Peng_BEVSegFormer_Birds_Eye_View_Semantic_Segmentation_From_Arbitrary_Camera_Rigs_WACV_2023_paper.pdf) 251 | - DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.08333.pdf) [[Github]](https://github.com/JiayuZou2020/DiffBEV) 252 | - HFT: Lifting Perspective Representations via Hybrid Feature Transformation (ICRA 2023) [[Paper]](https://arxiv.org/pdf/2204.05068.pdf) [[Github]](https://github.com/JiayuZou2020/HFT) 253 | - SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2302.04233.pdf) 254 | - Calibration-free BEV Representation for Infrastructure Perception (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.03583.pdf) 255 | - Semi-Supervised Learning for Visual Bird’s Eye View Semantic Segmentation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.14525.pdf) 256 | - DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEVPerception (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2305.03724.pdf) [[github]](https://github.com/YunzeMan/DualCross) [[Project]](https://yunzeman.github.io/DualCross/) 257 | - CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.02815.pdf) 258 | - SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.20318) [[Github]](https://github.com/abhi1kumar/SeaBird) 259 | - DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.06352) [[Github]](https://streamable.com/ge4v51) 260 | - Improved Single Camera BEV Perception Using Multi-Camera Training (ITSC 2024) [[Paper]](https://arxiv.org/abs/2409.02676) 261 | - Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2410.15932) 262 | - Geo-ConvGRU: Geographically Masked Convolutional Gated Recurrent Unit for Bird-Eye View Segmentation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.20171) 263 | #### Multiple Camera 264 | - A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (IEEE ITSC 2020)[[Paper]](https://arxiv.org/pdf/2005.04078.pdf) [[Github]](https://github.com/ika-rwth-aachen/Cam2BEV) 265 | - Cross-view Semantic Segmentation for Sensing Surroundings (IROS 2020 IEEE RA-L 2020) [[Paper]](https://arxiv.org/pdf/1906.03560.pdf) [[Github]](https://github.com/pbw-Berwin/View-Parsing-Network) [[Project]](https://decisionforce.github.io/VPN/) 266 | - Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020) [[Paper]](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590188.pdf) [[Github]](https://github.com/nv-tlabs/lift-splat-shoot) [[Project]](https://nv-tlabs.github.io/lift-splat-shoot/) 267 | - Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhou_Cross-View_Transformers_for_Real-Time_Map-View_Semantic_Segmentation_CVPR_2022_paper.pdf) [[Github]](https://github.com/bradyz/cross_view_transformers) 268 | - Scene Representation in Bird’s-Eye View from Surrounding Cameras withTransformers (CVPRW 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022W/WAD/papers/Zhao_Scene_Representation_in_Birds-Eye_View_From_Surrounding_Cameras_With_Transformers_CVPRW_2022_paper.pdf) 269 | - M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2204.05088.pdf) [[Project]](https://nvlabs.github.io/M2BEV/) 270 | - BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2205.09743.pdf) [[Github]](https://github.com/zhangyp15/BEVerse) 271 | - Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2206.04584.pdf) [[Github]](https://github.com/hustvl/GKT) 272 | - A Simple Baseline for BEV Perception Without LiDAR (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2206.07959.pdf) [[Github]](https://github.com/aharley/simple_bev) [[Project Page]](https://simple-bev.github.io/) 273 | - UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2207.08536.pdf) [[Github](https://github.com/cfzd/UniFusion) 274 | - LaRa: Latents and Rays for Multi-CameraBird’s-Eye-View Semantic Segmentation (CORL 2022) [[Paper]](https://proceedings.mlr.press/v205/bartoccioni23a/bartoccioni23a.pdf)) [[Github]](https://github.com/valeoai/LaRa) 275 | - CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers (CORL 2022) [[Paper]](https://arxiv.org/pdf/2207.02202.pdf) [[Github]](https://github.com/DerrickXuNu/CoBEVT) 276 | - Vision-based Uneven BEV Representation Learningwith Polar Rasterization and Surface Estimation (CORL 2022) [[Paper]](https://arxiv.org/pdf/2207.01878.pdf) [[Github]](https://github.com/SuperZ-Liu/PolarBEV) 277 | - BEVFormer: a Cutting-edge Baseline for Camera-based Detection (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136690001.pdf) [[Github]](https://github.com/fundamentalvision/BEVFormer) 278 | - JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136980692.pdf) [[Github]](https://github.com/sunnyHelen/JPerceiver) 279 | - Learning Ego 3D Representation as Ray Tracing (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136860126.pdf) [[Github]](https://github.com/fudan-zvg/Ego3RT) 280 | - Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception (NIPS 2022 Workshop) [[Paper]](https://arxiv.org/pdf/2301.07870.pdf) or [[Paper]](https://arxiv.org/pdf/2301.12511.pdf) [[Github]](https://github.com/Sense-GVT/Fast-BEV) 281 | - Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.12511.pdf) [[Github]](https://github.com/Sense-GVT/Fast-BEV) 282 | - BEVFormer v2: Adapting Modern Image Backbones toBird’s-Eye-View Recognition via Perspective Supervision (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2211.10439.pdf) 283 | - MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (CVPR 2023) [[Paper]](/https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_MapPrior_Birds-Eye_View_Map_Layout_Estimation_with_Generative_Models_ICCV_2023_paper.pdf) 284 | - Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2305.04205.pdf) [[Github]](https://github.com/lynn-yu/Bi-Mapper) 285 | - MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2211.10593.pdf) [[Github]](https://github.com/ZRandomize/MatrixVT) 286 | - MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation (ICCV 2023) [[paper]](https://arxiv.org/pdf/2304.09801.pdf) [[Github]](https://github.com/ChongjianGE/MetaBEV) [[Project]](https://chongjiange.github.io/metabev.html) 287 | - One Training for Multiple Deployments: Polar-based Adaptive BEV Perception for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.00525.pdf) 288 | - RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2304.06719.pdf) [[Github]](https://github.com/Daniel-xsy/RoboBEV) [[Project]](https://daniel-xsy.github.io/robobev/) 289 | - X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.03810.pdf) 290 | - PowerBEV: A Powerful Yet Lightweight Framework forInstance Prediction in Bird’s-Eye View (Axriv 2023) [[paper]](https://arxiv.org/pdf/2306.10761.pdf) 291 | - Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s-Eye View (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2307.04106.pdf) 292 | - Towards Viewpoint Robustness in Bird’s Eye View Segmentation (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2309.05192.pdf) [[Project]](https://nvlabs.github.io/viewpoint-robustness/) 293 | - PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.10761.pdf) 294 | - PointBeV: A Sparse Approach to BeV Predictions (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.00703.pdf) [[Github]](https://github.com/valeoai/PointBeV) 295 | - DualBEV: CNN is All You Need in View Transformation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.05402) 296 | - MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.08760) 297 | - HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.02517) [[Github]](https://github.com/VDIGPKU/HENet) 298 | - Improving Bird's Eye View Semantic Segmentation by Task Decomposition (CVPR 2024) [[Paper]](https://arxiv.org/abs/2404.01925) [[Github]](https://github.com/happytianhao/TaDe) 299 | - SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (CVPR 2024) [[Paper]](https://arxiv.org/abs/2404.02638) [[Github]](https://github.com/yejy53/SG-BEV) 300 | - RoadBEV: Road Surface Reconstruction in Bird's Eye View (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.06605) [[Github]](https://github.com/ztsrxh/RoadBEV) 301 | - TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.11803) 302 | - DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.02008) 303 | - Bird's-Eye View to Street-View: A Survey (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.08961) 304 | - LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.18852) 305 | - Navigation Instruction Generation with BEV Perception and Large Language Models (ECCV 2024) [[paper]](https://arxiv.org/abs/2407.15087) [[Github]](https://github.com/FanScy/BEVInstructor) 306 | - GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.14108) 307 | - MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation (ACM MM 2024) [[paper]](https://arxiv.org/abs/2408.09122) 308 | - Robust Bird’s Eye View Segmentation by Adapting DINOv2 (ECCV 2024 Workshop) [[Paper]](https://arxiv.org/pdf/2409.10228) 309 | - Unveiling the Black Box: Independent Functional Module Evaluation for Bird’s-Eye-View Perception Model (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.11969) 310 | - RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.11706) 311 | - OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping (ACCV 2024) [[Paper]](https://arxiv.org/abs/2409.13912) [[Github]](https://github.com/JialeWei/OneBEV) 312 | - ROAD-Waymo: Action Awareness at Scale for Autonomous Driving (NeurIPS 2024) [[Paper]](https://arxiv.org/abs/2411.01618) [[Github]](https://github.com/Z1zyw/VQ-Map) 313 | - Fast and Efficient Transformer-based Method for Bird’s Eye View Instance Prediction (IEEE ITSC 2024) [[Paper]](https://arxiv.org/pdf/2411.06851) [[Github]](https://github.com/miguelag99/Efficient-Instance-Prediction) 314 | - Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation (WACV 2025) [[paper]](https://arxiv.org/abs/2412.01595) 315 | - Revisiting Birds Eye View Perception Models with Frozen Foundation Models: DINOv2 and Metric3Dv2 (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2501.08118) 316 | - SegLocNet: Multimodal Localization Network for Autonomous Driving via Bird's-Eye-View Segmentation (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.20077) 317 | - BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2502.19694) 318 | - Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.00675) 319 | - TS-CGNet: Temporal-Spatial Fusion Meets Centerline-Guided Diffusion for BEV Mapping (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.02578) [[Github]](https://github.com/krabs-H/TS-CGNet) 320 | - BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation (Arxiv 2025) [[Paper[[(BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation) 321 | - HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.06821) 322 | - MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations (ICLR 2025) [[Paper]](https://arxiv.org/abs/2503.13858) 323 | ### Perception Prediction Planning 324 | #### Monocular 325 | - Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning (WACV 2021) [[Paper]](https://openaccess.thecvf.com/content/WACV2021/papers/Loukkal_Driving_Among_Flatmobiles_Bird-Eye-View_Occupancy_Grids_From_a_Monocular_Camera_WACV_2021_paper.pdf) 326 | - HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow Prediction (CVPRW 2022) [[paper]](https://arxiv.org/pdf/2206.10118.pdf) 327 | #### Multiple Camera 328 | - FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV 2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Hu_FIERY_Future_Instance_Prediction_in_Birds-Eye_View_From_Surround_Monocular_ICCV_2021_paper.pdf) [[Github]](https://github.com/wayveai/fiery) [[Project]](https://anthonyhu.github.io/fiery) 329 | - NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021) [[Paper]](https://arxiv.org/pdf/2109.04456.pdf) [[Github]](https://github.com/autonomousvision/neat) 330 | - ST-P3: End-to-end Vision-based AutonomousDriving via Spatial-Temporal Feature Learning (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136980522.pdf) [[Github]](https://github.com/OpenPerceptionX/ST-P3) 331 | - StretchBEV: Stretching Future InstancePrediction Spatially and Temporally (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136980436.pdf) [[Github]](https://github.com/kaanakan/stretchbev) [[Projet]](https://kuis-ai.github.io/stretchbev/) 332 | - TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2303.09998.pdf) [[Github]](https://github.com/MediaBrain-SJTU/TBP-Former) 333 | - Planning-oriented Autonomous Driving (CVPR 2023, Occupancy Prediction) [[paper]](https://arxiv.org/pdf/2212.10156.pdf) [[Github]](https://github.com/OpenDriveLab/UniAD) [[Project]](https://opendrivelab.github.io/UniAD/) 334 | - Think Twice before Driving:Towards Scalable Decoders for End-to-End Autonomous Driving (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2305.06242.pdf) [[Github]](https://github.com/OpenDriveLab/ThinkTwice) 335 | - ReasonNet: End-to-End Driving with Temporal and Global Reasoning (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2305.10507.pdf) 336 | - LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2307.08850.pdf) 337 | - FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.01006.pdf) 338 | - VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.13243) [[Project]](https://hgao-cv.github.io/VADv2/) 339 | - SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.06892) 340 | - SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.19620) [[Github]](https://github.com/swc-17/SparseDrive) 341 | - DUALAD: Disentangling the Dynamic and Static World for End-to-End Driving (CVPR 2024) [[Paper]](https://openaccess.thecvf.com/content/CVPR2024/papers/Doll_DualAD_Disentangling_the_Dynamic_and_Static_World_for_End-to-End_Driving_CVPR_2024_paper.pdf) 342 | - Solving Motion Planning Tasks with a Scalable Generative Model (ECCV 2024) [[Paper]](/https://arxiv.org/pdf/2407.02797) [[Github]](https://github.com/HorizonRobotics/GUMP/) 343 | ### Mapping 344 | #### Lidar 345 | - Hierarchical Recurrent Attention Networks for Structured Online Map (CVPR 2018) [[Paper]](https://openaccess.thecvf.com/content_cvpr_2018/papers/Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.pdf) 346 | #### Lidar Camera 347 | - End-to-End Deep Structured Models for Drawing Crosswalks (ECCV 2018) [[Paper]](https://arxiv.org/pdf/2012.11585.pdf) 348 | - Probabilistic Semantic Mapping for Urban Autonomous Driving Applications (IROS 2020) [[Paper]](http://ras.papercept.net/images/temp/IROS/files/2186.pdf) [[Github]](https://github.com/MediaBrain-SJTU/TBP-Former) 349 | - Convolutional Recurrent Network for Road Boundary Extraction (CVPR 2022) [[Paper]](https://nhoma.github.io/papers/road_cvpr19.pdf) 350 | - Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [[Paper]](https://arxiv.org/pdf/2105.00195.pdf) 351 | - M^2-3DLaneNet: Multi-Modal 3D Lane Detection (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2209.05996.pdf) [[Github]](https://github.com/JMoonr/mmlane) 352 | - HDMapNet: An Online HD Map Construction and Evaluation Framework (ICRA 2022) [[paper]](https://arxiv.org/pdf/2107.06307.pdf) [[Github]](https://github.com/Tsinghua-MARS-Lab/HDMapNet) [[Project]](https://tsinghua-mars-lab.github.io/HDMapNet/) 353 | - SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2211.15656.pdf) [[Github]](https://github.com/haomo-ai/SuperFusion) 354 | - VMA: Divide-and-Conquer Vectorized MapAnnotation System for Large-Scale Driving Scene (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.09807.pdf) 355 | - THMA: Tencent HD Map AI System for Creating HD Map Annotations (AAAI 2023) [[paper]](https://arxiv.org/pdf/2212.11123.pdf) 356 | #### Monocular 357 | - RoadTracer: Automatic Extraction of Road Networks from Aerial Images (CVPR 2018) [[Paper]](https://arxiv.org/pdf/1802.03680.pdf) [[Github]](https://github.com/mitroadmaps/roadtracer) 358 | - DAGMapper: Learning to Map by Discovering Lane Topology (ICCV 2019) [[paper]](https://arxiv.org/pdf/2012.12377.pdf) 359 | - End-to-end Lane Detection through Differentiable Least-Squares Fitting (ICCVW 2019) [[paper]](https://openaccess.thecvf.com/content_ICCVW_2019/papers/CVRSUAD/Van_Gansbeke_End-to-end_Lane_Detection_through_Differentiable_Least-Squares_Fitting_ICCVW_2019_paper.pdf) 360 | - VecRoad: Point-based Iterative Graph Exploration for Road Graphs Extraction (CVPR 2020) [[Paper]](https://openaccess.thecvf.com/content_CVPR_2020/papers/Tan_VecRoad_Point-Based_Iterative_Graph_Exploration_for_Road_Graphs_Extraction_CVPR_2020_paper.pdf) [[Github]](https://github.com/tansor/VecRoad) [[Project]](https://mmcheng.net/vecroad/) 361 | - Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [[paper]](https://arxiv.org/pdf/2007.09547.pdf) [[Github]](https://github.com/songtaohe/Sat2Graph) 362 | - iCurb: Imitation Learning-based Detection of Road Curbs using Aerial Images for Autonomous Driving (ICRA 2021 IEEE RA-L) [[paper]](https://arxiv.org/pdf/2103.17118.pdf) [[Github]](https://github.com/TonyXuQAQ/Topo-boundary/tree/master/graph_based_baselines/iCurb) [[Project]](https://tonyxuqaq.github.io/projects/iCurb/) 363 | - HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps (CVPR 2021) [[paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Mi_HDMapGen_A_Hierarchical_Graph_Generative_Model_of_High_Definition_Maps_CVPR_2021_paper.pdf) 364 | - Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV 2021) [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Can_Structured_Birds-Eye-View_Traffic_Scene_Understanding_From_Onboard_Images_ICCV_2021_paper.pdf) [[Github]](https://github.com/ybarancan/STSU) 365 | - RNGDet: Road Network Graph Detection by Transformer in Aerial Images (IEEE TGRS 2022) [[[Paper]](https://arxiv.org/pdf/2202.07824.pdf) [[Project]](https://tonyxuqaq.github.io/projects/RNGDet/) 366 | - RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement (IEEE RA-L 2022) [[Paper]](https://tonyxuqaq.github.io/assets/pdf/2022_RAL_RNGDetPlusPlus.pdf) [[Github]](https://github.com/TonyXuQAQ/RNGDetPlusPlus) [[Project]](https://github.com/TonyXuQAQ/RNGDetPlusPlus) 367 | - SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving (ICRA 2022) [[paper]](https://arxiv.org/pdf/2109.07701.pdf) [[Github]](https://github.com/wgcban/SPIN_RoadMapper) 368 | - Laneformer: Object-aware Row-Column Transformers for Lane Detection (AAAI 2022) [[Paper]](https://arxiv.org/pdf/2203.09830.pdf) 369 | - Lane-Level Street Map Extraction from Aerial Imagery (WACV 2022) [[Paper]](https://openaccess.thecvf.com/content/WACV2022/papers/He_Lane-Level_Street_Map_Extraction_From_Aerial_Imagery_WACV_2022_paper.pdf) [[Github]](https://github.com/songtaohe/LaneExtraction) 370 | - Reconstruct from Top View: A 3D Lane Detection Approach based on GeometryStructure Prior (CVPRW 2022) [[paper]](https://openaccess.thecvf.com/content/CVPR2022W/WAD/papers/Li_Reconstruct_From_Top_View_A_3D_Lane_Detection_Approach_Based_CVPRW_2022_paper.pdf) 371 | - PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2111.15491.pdf) [[Github]](https://github.com/zorzi-s/PolyWorldPretrainedNetwork) 372 | - Topology Preserving Local Road Network Estimation from Single Onboard Camera Image (CVPR 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Can_Topology_Preserving_Local_Road_Network_Estimation_From_Single_Onboard_Camera_CVPR_2022_paper.pdf) [[Github]](https://github.com/ybarancan/TopologicalLaneGraph) 373 | - TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136690553.pdf) 374 | - CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D (IEEE IVS 2023) [[Paper]](https://arxiv.org/pdf/2302.02259.pdf) 375 | - Polygonizer: An auto-regressive building delineator (ICLRW 2023) [[Paper]](https://arxiv.org/pdf/2304.04048.pdf) 376 | - CurveFormer: 3D Lane Detection by Curve Propagation with CurveQueries and Attention (ICRA 2023) [[Paper]](https://arxiv.org/pdf/2209.07989.pdf) 377 | - Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection (CVPR 2023) [[paper]](https://arxiv.org/pdf/2301.02371.pdf) [[Github]](https://github.com/tusen-ai/Anchor3DLane) 378 | - Learning and Aggregating Lane Graphs for Urban Automated Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2302.06175.pdf) 379 | - Online Lane Graph Extraction from Onboard Video (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2304.00930.pdf) [[Github]](https://github.com/hustvl/LaneGAP) 380 | - Video Killed the HD-Map: Predicting Driving BehaviorDirectly From Drone Images (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.11856.pdf) 381 | - Prior Based Online Lane Graph Extraction from Single Onboard Camera Image (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.13344.pdf) 382 | - Online Monocular Lane Mapping Using Catmull-Rom Spline (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.11653.pdf) [[Github]](https://github.com/HKUST-Aerial-Robotics/MonoLaneMapping) 383 | - Improving Online Lane Graph Extraction by Object-Lane Clustering (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2307.10947.pdf) 384 | - LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.04583v1.pdf) [[Github]](https://github.com/JMoonr/LATR) 385 | - Patched Line Segment Learning for Vector Road Mapping (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.02923.pdf) 386 | - Sparse Point Guided 3D Lane Detection (ICCV 2023) [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Yao_Sparse_Point_Guided_3D_Lane_Detection_ICCV_2023_paper.pdf) [[Github]](https://github.com/YaoChengTang/Sparse-Point-Guided-3D-Lane-Detection) 387 | - Recursive Video Lane Detection (ICCV 2023) [[Paper]](https://browse.arxiv.org/pdf/2308.11106.pdf) [[Github]](https://github.com/dongkwonjin/RVLD) 388 | - LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.04583.pdf) [[Github]](https://github.com/JMoonr/LATR) 389 | - Occlusion-Aware 2D and 3D Centerline Detection for Urban Driving via Automatic Label Generation (ARXIV 2023) [[PAPER]](https://arxiv.org/pdf/2311.02044.pdf) 390 | - BUILDING LANE-LEVEL MAPS FROM AERIAL IMAGES (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.13449.pdf) 391 | - LaneCPP: Continuous 3D Lane Detection using Physical Priors (CVPR 2024) [[Paper]](https://arxiv.org/pdf/2406.08381) 392 | - DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.00769) [[Github]]() 393 | #### Multiple Camera 394 | - PersFormer: a New Baseline for 3D Laneline Detection (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136980539.pdf) [[Github]](https://github.com/OpenDriveLab/PersFormer_3DLane) 395 | - Continuity-preserving Path-wise Modeling for Online Lane Graph Construction (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.08815.pdf) [[Github]](https://github.com/hustvl/LaneGAP) 396 | - VAD: Vectorized Scene Representation for Efficient Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.12077.pdf) [[Github]](https://github.com/hustvl/VAD) 397 | - InstaGraM: Instance-level Graph Modelingfor Vectorized HD Map Learning (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.04470.pdf) 398 | - VectorMapNet: End-to-end Vectorized HD Map Learning (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2206.08920.pdf) [[Github]](https://github.com/Mrmoore98/VectorMapNet_code) [[Project]](https://tsinghua-mars-lab.github.io/vectormapnet/) 399 | - Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.10440.pdf) [[Github]](https://github.com/OpenDriveLab/OpenLane-V2) 400 | - Topology Reasoning for Driving Scenes (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2304.05277.pdf) [[Github]](https://github.com/OpenDriveLab/TopoNet) 401 | - MV-Map: Offboard HD-Map Generation with Multi-view Consistency (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2305.08851.pdf) [[Github]](https://github.com/ZiYang-xie/MV-Map) 402 | - CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation (ICRA 2023) [[paper]](https://arxiv.org/pdf/2209.07734.pdf) [[Github]](https://github.com/TonyXuQAQ/CenterLineDet) 403 | - Structured Modeling and Learning for Online Vectorized HD Map Construction (ICLR 2023) [[paper]](https://arxiv.org/pdf/2208.14437.pdf) [[Github]](https://github.com/hustvl/MapTR) 404 | - Neural Map Prior for Autonomous Driving (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2304.08481.pdf) 405 | - An Efficient Transformer for Simultaneous Learning of BEV and LaneRepresentations in 3D Lane Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.04927.pdf) 406 | - TopoMask: Instance-Mask-Based Formulation for the Road Topology Problemvia Transformer-Based Architecture (Arxiv 2023) [[apper]](https://arxiv.org/pdf/2306.05419.pdf) 407 | - PolyDiffuse: Polygonal Shape Reconstruction viaGuided Set Diffusion Models (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.01461.pdf) [[Github]](https://github.com/woodfrog/poly-diffuse) [[Project]](https://poly-diffuse.github.io/) 408 | - Online Map Vectorization for Autonomous Driving: A Rasterization Perspective (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.10502.pdf) 409 | - NeMO: Neural Map Growing System forSpatiotemporal Fusion in Bird’s-Eye-Viewand BDD-Map Benchmark (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.04540.pdf) 410 | - MachMap: End-to-End Vectorized Solution for Compact HD-Map Construction (CVPR 2023 Workshop) [[Paper]](https://arxiv.org/pdf/2306.10301.pdf) 411 | - Lane Graph as Path: Continuity-preserving Path-wise Modelingfor Online Lane Graph Construction (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.08815.pdf) 412 | - End-to-End Vectorized HD-map Construction with Piecewise B ́ezier Curve (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2306.09700.pdf) [[Github]](https://github.com/er-muyue/BeMapNet) 413 | - GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.09472.pdf) 414 | - MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.05736.pdf) 415 | - LATR: 3D Lane Detection from Monocular Images with Transformer (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.04583.pdf) 416 | - INSIGHTMAPPER: A CLOSER LOOK AT INNER-INSTANCE INFORMATION FOR VECTORIZED HIGH-DEFINITION MAPPING (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.08543.pdf) [[Project]](https://tonyxuqaq.github.io/InsightMapper/) [[Github]](https://github.com/TonyXuQAQ/InsightMapper/tree/main) 417 | - HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization (Arxiv 2023) [[Paper]](https://browse.arxiv.org/pdf/2305.02080.pdf) 418 | - StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction (WACV 2024) [[Paper]](https://arxiv.org/pdf/2308.12570.pdf) [[Github]](https://github.com/yuantianyuan01/StreamMapNet) 419 | - PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.16477.pdf) 420 | - Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach (ICCV 2023) [[paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Lu_Translating_Images_to_Road_Network_A_Non-Autoregressive_Sequence-to-Sequence_Approach_ICCV_2023_paper.pdf) 421 | - TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.06753.pdf) [[Github]](https://github.com/wudongming97/TopoMLP) 422 | - ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction (CoRL 2023) [[Paper]](https://arxiv.org/pdf/2310.13378.pdf) [[Github]](https://github.com/jingy1yu/ScalableMap) 423 | - Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.10517.pdf) 424 | - Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.04079.pdf) [[Github]](https://github.com/NVlabs/SMERF) 425 | - P-MAPNET: FAR-SEEING MAP CONSTRUCTOR ENHANCED BY BOTH SDMAP AND HDMAP PRIORS (ICLR 2024 submitted paper) [[Openreview]](https://openreview.net/forum?id=lgDrVM9Rpx) [[Paper]](https://openreview.net/pdf?id=lgDrVM9Rpx) 426 | - Online Vectorized HD Map Construction using Geometry (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.03341.pdf) [[Github]](https://github.com/cnzzx/GeMap) 427 | - LANESEGNET: MAP LEARNING WITH LANE SEGMENT PERCEPTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.16108.pdf) [[Github]](https://github.com/OpenDriveLab/LaneSegNet) 428 | - 3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching (Arxiv 2024) [[Paper](https://arxiv.org/pdf/2401.08036.pdf) 429 | - MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.07323.pdf) 430 | - Stream Query Denoising for Vectorized HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.09112.pdf) 431 | - ADMap: Anti-disturbance framework for reconstructing online vectorized HD map (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.13172.pdf) 432 | - PLCNet: Patch-wise Lane Correction Network for Automatic Lane Correction in High-definition Maps (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.14024.pdf) 433 | - LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement (AAAI 2024) [[paper]](https://arxiv.org/pdf/2401.17609.pdf) 434 | - VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [[Paper]](https://yanzhenyu.com/assets/pdf/VI-Map-MobiCom23.pdf) 435 | - CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.06423) 436 | - VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [[paper]](https://yanzhenyu.com/assets/pdf/VI-Map-MobiCom23.pdf) 437 | - Lane2Seq: Towards Unified Lane Detection via Sequence Generation (CVPR 2024) [[Paper]](https://arxiv.org/abs/2402.17172) 438 | - Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.17430) [[Github]](https://github.com/HXMap/MapQR) 439 | - MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.15951) [[Github]](https://map-tracker.github.io/) 440 | - Producing and Leveraging Online Map Uncertainty in Trajectory Prediction (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.16439) [[Github]](https://github.com/alfredgu001324/MapUncertaintyPrediction) 441 | - MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction (CVPR 2024) [[Paper]](https://arxiv.org/abs/2404.00876) [[Github]](https://github.com/xiaolul2/MGMap) 442 | - HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.08639) 443 | - SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.00250) 444 | - DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.05518) 445 | - Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.01016) 446 | - Is Your HD Map Constructor Reliable under Sensor Corruptions? (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.12214) [[Github]](https://github.com/mapbench/toolkit) [[Project]](https://mapbench.github.io/) 447 | - DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation(KDD 2024)[[Paper]](https://arxiv.org/abs/2406.14255) 448 | - LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.13988) 449 | - Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention (ECCV 2024) [[Paper]](https://arxiv.org/pdf/2407.06683) [[Github]](https://github.com/alfredgu001324/MapBEVPrediction) 450 | - BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.08526) 451 | - Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.08726) 452 | - MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation (ECCV 2024) [[Paper]](https://arxiv.org/abs/2407.11682) 453 | - Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.13517) [[Github]](https://github.com/SehwanChoi0307/Mask2Map) 454 | - Generation of Training Data from HD Maps in the Lanelet2 Framework (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.17409) 455 | - PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction (Arxiv 2024) [[paper]](https://arxiv.org/abs/2407.17378) [[Github]](https://github.com/pnnnnnnn/PrevPredMap) 456 | - CAMAv2: A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2024) [[Paper]](/https://arxiv.org/pdf/2407.21331) 457 | - HeightLane: BEV Heightmap guided 3D Lane Detection (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2408.08270) 458 | - PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.08802) 459 | - Local map Construction Methods with SD map: A Novel Survey (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.02415) 460 | - Enhancing Vectorized Map Perception with Historical Rasterized Maps (ECCV 2024) [[Paper]](https://arxiv.org/abs/2409.00620) [[Github]](https://github.com/HXMap/HRMapNet) 461 | - GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.08688v1) [[Github]](https://github.com/lynn-yu/GenMapping?tab=readme-ov-file) 462 | - GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction (Arxiv 2024) [[paper]] (https://arxiv.org/abs/2409.10063) 463 | - MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.18737) 464 | - MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2410.07733) 465 | - Exploring Semi-Supervised Learning for Online Mapping (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2410.10279) 466 | - OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.23278) [[Github]](https://github.com/bjzhb666/get_google_maps_image) [[Project]](https://opensatmap.github.io/) 467 | - HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning (WACV 2025) [[Paper]](https://arxiv.org/abs/2411.01408) [[Github]](https://github.com/adasfag/HeightMapNet/) 468 | - M3TR: Generalist HD Map Construction with Variable Map Priors (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.10316) [[Github]](https://github.com/immel-f/m3tr) 469 | - TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.14751) 470 | - ImagineMap: Enhanced HD Map Construction with SD Maps (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.16938) 471 | - Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression (TPAMI 2025) [[paper]](https://arxiv.org/abs/2412.16889) [[Github]](https://github.com/tusen-ai/Anchor3DLane) 472 | - LDMapNet-U: An End-to-End System for City-Scale Lane-Level Map Updating (KDD 2025) [[Paper]](https://arxiv.org/pdf/2501.02763) 473 | - MapGS: Generalizable Pretraining and Data Augmentation for Online Mapping via Novel View Synthesis (Arxiv 2025) [[paper]](https://arxiv.org/pdf/2501.06660) 474 | - Topo2Seq: Enhanced Topology Reasoning via Topology Sequence Learning (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2502.08974) 475 | - Leveraging V2X for Collaborative HD Maps Construction Using Scene Graph Generation (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2502.10127) 476 | - FastMap: Fast Queries Initialization Based Vectorized HD Map Reconstruction Framework (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.05492) [[Github]](https://github.com/hht1996ok/FastMap) 477 | - Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.07485) 478 | - HisTrackMap: Global Vectorized High-Definition Map Construction via History Map Tracking (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.07168) 479 | - AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.13430) 480 | ### Lanegraph 481 | #### Monocular 482 | - Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [[Paper]](https://arxiv.org/abs/2105.00195) 483 | - AutoGraph: Predicting Lane Graphs from Traffic Observations (IEEE RAL 2023) [[Paper]](https://arxiv.org/abs/2306.15410) 484 | - Learning and Aggregating Lane Graphs for Urban Automated Driving (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2302.06175.pdf) 485 | - TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2405.14747) 486 | - Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.03105) 487 | - Learning Lane Graphs from Aerial Imagery Using Transformers (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.05687) 488 | - TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.11325) 489 | - LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations (ITSC 2024) [[Paper]](https://arxiv.org/abs/2409.12409) 490 | - Behavioral Topology (BeTop), a multi-agent behavior formulation for interactive motion prediction and planning (NeurIPS 2024) [[Paper]](https://arxiv.org/abs/2409.18031) [[Github]](https://github.com/OpenDriveLab/BeTop) 491 | - SMART: Advancing Scalable Map Priors for Driving Topology Reasoning (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2502.04329) [[Project]](https://jay-ye.github.io/smart/) 492 | ### Tracking 493 | - Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer (Arxiv 2022) [[Paper]](https://arxiv.org/pdf/2208.05216.pdf) [[Github]](https://github.com/Jasonkks/PTTR) 494 | - EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.13350.pdf) [[Github]](https://github.com/tteepe/EarlyBird) 495 | - Traj-MAE: Masked Autoencoders for Trajectory Prediction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.06697) 496 | - Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.20743) 497 | - MapsTP: HD Map Images Based Multimodal Trajectory Prediction for Automated Vehicles (Arixv 2024) [[Paper]](https://arxiv.org/pdf/2407.05811) 498 | - Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.11644) 499 | - Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.12491) 500 | - VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions (Arxiv 2024) [[Paper]](https://moonseokha.github.io/VisionTrap/) 501 | ### Locate 502 | - BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2211.14927.pdf) 503 | - BEV-SLAM: Building a Globally-Consistent WorldMap Using Monocular Vision (IROS 2022) [[Paper]](https://cvssp.org/Personal/OscarMendez/papers/pdf/RossIROS2022.pdf) 504 | - U-BEV: Height-aware Bird’s-Eye-View Segmentation and Neural Map-based Relocalization (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.13766.pdf) 505 | - Monocular Localization with Semantics Map for Autonomous Vehicles (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.03835) 506 | ### Occupancy Prediction 507 | - Semantic Scene Completion from a Single Depth Image (CVPR 2017) [[Paper]](https://arxiv.org/pdf/1611.08974.pdf) 508 | - Occupancy Networks: Learning 3D Reconstruction in Function Space (CVPR 2019) [[Paper]](https://arxiv.org/pdf/1812.03828.pdf) [[Github]](https://avg.is.mpg.de/publications/occupancy-networks) 509 | - S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds (CoRL 2020) [[Paper]](https://arxiv.org/pdf/2012.09242.pdf) 510 | - 3D Semantic Scene Completion: a Survey (IJCV 2021) [[Paper]](https://arxiv.org/pdf/2103.07466.pdf) 511 | - Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data (Arxiv 2021) [[Paper]](https://arxiv.org/pdf/2011.09141.pdf) 512 | - Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (AAAI 2021) [[Paper]](https://arxiv.org/pdf/2012.03762.pdf) 513 | - Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [[Paper]](https://arxiv.org/pdf/2004.02122.pdf) 514 | - Estimation of Appearance and Occupancy Information in Bird’s EyeView from Surround Monocular Images (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2211.04557.pdf) [[Project]](https://uditsinghparihar.github.io/APP_OCC/) 515 | - Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds (IROS 2021) [[Paper]](https://arxiv.org/pdf/2109.11453.pdf) [[Github]](https://github.com/jokester-zzz/SSA-SC) 516 | - Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.01212.pdf) 517 | - LMSCNet: Lightweight Multiscale 3D Semantic Completion (IC 3DV 2020) [[Paper]](https://arxiv.org/pdf/2008.10559.pdf) [[[Github]](https://github.com/astra-vision/LMSCNet) 518 | - MonoScene: Monocular 3D Semantic Scene Completion (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2112.00726.pdf) [[Github]](https://github.com/astra-vision/MonoScene) [[Project]](https://astra-vision.github.io/MonoScene/) 519 | - OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2304.05316.pdf) [[Github]](https://github.com/zhangyp15/OccFormer) 520 | - A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.10076.pdf) [[Github]](https://github.com/GANWANSHUI/SimpleOccupancy) 521 | - OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2302.13540.pdf) [[Github]](https://github.com/megvii-research/OccDepth) 522 | - OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.03991.pdf) [[Github]](https://github.com/JeffWang987/OpenOccupancy) 523 | - Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.14365.pdf) [[Github]](https://github.com/Tsinghua-MARS-Lab/Occ3D) [[Project]](https://tsinghua-mars-lab.github.io/Occ3D/) 524 | - Occ-BEV: Multi-Camera Unified Pre-training via 3DScene Reconstruction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.18829.pdf) [[Github]](https://github.com/chaytonmin/Occ-BEV) 525 | - StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.13959.pdf) [[Github]](https://github.com/Arlo0o/StereoScene) 526 | - Learning Occupancy for Monocular 3D Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.15694.pdf) [[Github]](https://github.com/SPengLiang/OccupancyM3D) 527 | - OVO: Open-Vocabulary Occupancy (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.16133.pdf) 528 | - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2303.09551.pdf) [[Github]](https://github.com/weiyithu/SurroundOcc) [[Project]](https://weiyithu.github.io/SurroundOcc/) 529 | - Scene as Occupancy (Arxiv 2023) [[Paper]]](https://arxiv.org/pdf/2306.02851.pdf) [[Github]](https://github.com/OpenDriveLab/OccNet) 530 | - Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2301.00527.pdf) [[Github]](https://github.com/zoomin-lee/scene-scale-diffusion) 531 | - PanoOcc: Unified Occupancy Representation for Camera-based3D Panoptic Segmentation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.10013.pdf) [[Github]](https://github.com/Robertwyq/PanoOcc) 532 | - UniOcc: Unifying Vision-Centric 3D Occupancy Predictionwith Geometric and Semantic Rendering (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.09117.pdf) 533 | - SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving (NeurIPS 2023 D&B track) [[paper]](https://arxiv.org/pdf/2306.09001.pdf) [[paper]](https://github.com/ai4ce/SSCBench) 534 | - StereoVoxelNet: Real-Time Obstacle Detection Based on OccupancyVoxels from a Stereo Camera Using Deep Neural Networks (ICRA 2023) [[Paper]] (https://arxiv.org/pdf/2209.08459.pdf) [[Github]](https://github.com/RIVeR-Lab/stereovoxelnet) [[Project]](https://lhy.xyz/stereovoxelnet/) 535 | - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction (CVPR 2023) [[Paper]](https://arxiv.org/pdf/2302.07817.pdf) [[Github]](https://github.com/wzzheng/TPVFormer) 536 | - VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction (CVPR 2023) [[paper]](https://arxiv.org/pdf/2302.12251.pdf) [[Github]](https://github.com/NVlabs/VoxFormer) 537 | - Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting (CVPR 2023) [[Paper]](/https://arxiv.org/pdf/2302.13130.pdf) [[Github]](https://www.cs.cmu.edu/~tkhurana/ff4d/index.html) [[Project]](https://github.com/tarashakhurana/4d-occ-forecasting) 538 | - SSCBench: A Large-Scale 3D Semantic SceneCompletion Benchmark for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.09001.pdf) [[Github]](https://github.com/ai4ce/SSCBench) 539 | - SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion (IROS 2023) [[Paper]](https://arxiv.org/pdf/2306.15349.pdf) [[Github]](https://github.com/Jieqianyu/SSC-RS) 540 | - CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2307.07938.pdf) 541 | - Symphonize 3D Semantic Scene Completion with Contextual Instance Queries (Arxiv 2023) [[Paper]](/https://arxiv.org/pdf/2306.15670.pdf) [[Github]](https://github.com/hustvl/Symphonies) 542 | - Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2206.09900.pdf) 543 | - UniWorld: Autonomous Driving Pre-training via World Models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.07234.pdf) [[Github]](https://github.com/chaytonmin/UniWorld) 544 | - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.16896.pdf) [[Github]](https://github.com/wzzheng/PointOcc) 545 | - SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.13794.pdf) [[Github]](https://github.com/zhouqiu/SOGDet) 546 | - OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.08504.pdf) [[Github]](https://github.com/jypjypjypjyp/OccupancyDETR) 547 | - PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.12708.pdf) 548 | - SPOT: SCALABLE 3D PRE-TRAINING VIA OCCUPANCY PREDICTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.10527.pdf) 549 | - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space (Arxiv 2023) [[Github]](https://github.com/Jiawei-Yao0812/NDCScene) 550 | - Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [[Github]](https://github.com/waterljwant/SSC) [[Project]](https://waterljwant.github.io/SSC/) 551 | - RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.09502.pdf) [[Github]](https://github.com/pmj110119/RenderOcc) 552 | - LiDAR-based 4D Occupancy Completion and Forecasting (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.11239.pdf) [[Github]](https://github.com/ai4ce/Occ4cast) 553 | - SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.11371.pdf) 554 | - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.12754.pdf) [[Github]](https://github.com/huang-yh/SelfOcc) 555 | - FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.12058.pdf) 556 | - Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.17663.pdf) [[Github]](https://github.com/haomo-ai/Cam4DOcc) 557 | - OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.16038.pdf) [[Github]](https://github.com/wzzheng/OccWorld) 558 | - DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.17084.pdf) 559 | - A Simple Framework for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.10076.pdf) [[Github]](https://github.com/GANWANSHUI/SimpleOccupancy) 560 | - OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.03774.pdf) 561 | - COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.01919.pdf) 562 | - OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.09243.pdf) [[Github]](https://github.com/LinShan-Bin/OccNeRF) 563 | - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.11829.pdf) 564 | - PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.02158.pdf) [[Project]](https://astra-vision.github.io/PaSCo/) [[Github]](https://github.com/astra-vision/PaSCo) 565 | - POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.09413.pdf) [[Github]](https://arxiv.org/pdf/2401.09413.pdf) 566 | - S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.13785.pdf) 567 | - InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.12422.pdf) 568 | - V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2402.04671.pdf) 569 | - OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.12792) 570 | - OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.01644) 571 | - OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.18140) 572 | - FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View (ICRA 2024) [[Paper]](https://arxiv.org/abs/2403.02710) 573 | - OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.05329) 574 | - PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (CVPR 2024) [[Paper]](https://arxiv.org/abs/2212.02501) [[Github]](https://github.com/astra-vision/PaSCo?tab=readme-ov-file) 575 | - Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.08748) 576 | - OccFiner: Offboard Occupancy Refinement with Hybrid Propagation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.08504) 577 | - MonoOcc: Digging into Monocular Semantic Occupancy Prediction (ICLR 2024) [[Paper]](https://arxiv.org/abs/2403.08766) 578 | - OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.11796) 579 | - Urban Scene Diffusion through Semantic Occupancy Map (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.11697) 580 | - Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2404.04561) [[Github]](https://rorisis.github.io/Co-Occ_project-page/) 581 | - SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction (CVPR 2024) [[Paper]](https://arxiv.org/abs/2404.09502) 582 | - Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation (CVPR 2024) [[paper]](https://arxiv.org/abs/2404.11958) [[Github]](https://github.com/songw-zju/HASSC) 583 | - OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks (Arxiv 2023) [[Paper]](https://arxiv.org/abs/2404.13046) 584 | - OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2404.15014) 585 | - ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.04299) 586 | - A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.05173) 587 | - Vision-based 3D occupancy prediction in autonomous driving: a review and outlook (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.02595) 588 | - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.10591) 589 | - RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.14014) 590 | - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.17429) [[Github]](https://github.com/huang-yh/GaussianFormer) 591 | - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.20337) [[Github]](https://github.com/wzzheng/OccSora) 592 | - EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.07042) [[Github]](https://github.com/synsin0/EFFOcc) 593 | - PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving (3DV 2024) [[paper]](https://arxiv.org/abs/2406.07037) 594 | - UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Arxiv 2024) [[paper]](https://arxiv.org/abs/2406.08691) 595 | - Context and Geometry Aware Voxel Transformer for Semantic Scene Completion (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.13675) [[Github]](https://github.com/pkqbajng/CGFormer?tab=readme-ov-file) 596 | - Occupancy as Set of Points (ECCV 2024) [[Paper]](https://arxiv.org/abs/2407.04049) [[Github]](https://github.com/hustvl/osp) 597 | - Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.03425) 598 | - Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.07587) 599 | - Monocular Occupancy Prediction for Scalable Indoor Scenes (ECCV 2024) [[Paper]](https://arxiv.org/abs/2407.11730) [[Github]](https://github.com/hongxiaoy/ISO) 600 | - LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.17310) 601 | - VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2408.03551) 602 | - Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2408.03790) [[Github]](https://github.com/chreisinger/ViLGOD) 603 | - OccMamba: Semantic Occupancy Prediction with State Space Models (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2408.09859) 604 | - HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction (IEEE RAL 2024) [[paper]](https://arxiv.org/abs/2408.09104) 605 | - Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2408.11559) 606 | - MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering (Arxiv 2024) [[paper]](https://arxiv.org/abs/2408.11464) [[Project]](https://ganwanshui.github.io/GaussianOcc/) [[Github]](https://github.com/Hub-Tian/MambaOcc) 607 | - GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting (Arxiv 2024) [[paper]](https://ganwanshui.github.io/GaussianOcc/) [[Github]](https://github.com/GANWANSHUI/GaussianOcc) 608 | - AdaOcc: Adaptive-Resolution Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.13454) 609 | - Diffusion-Occ: 3D Point Cloud Completion via Occupancy Diffusion (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.14846) 610 | - UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height (Arxiv 2024) [[paper]](https://arxiv.org/abs/2409.11160) 611 | - COCO-Occ: A Benchmark for Occluded Panoptic Segmentation and Image Understanding (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.12760) 612 | - CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction (ECCV 2024) [[Paper]](https://arxiv.org/abs/2409.13430) [[Github]](https://github.com/Tsinghua-MARS-Lab/CVT-Occ) 613 | - ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.18026) 614 | - DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.18092) 615 | - SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.00337) 616 | - OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.19987) [[Github]](https://github.com/jmwang0117/OccRWKV) [[Project]](https://jmwang0117.github.io/OccRWKV/) 617 | - DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.19972) [[Github]](https://github.com/AlphaPlusTT/DAOcc) 618 | - OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.01261) 619 | - OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction (Arxiv 2024) [[paper]](https://arxiv.org/abs/2411.03696) 620 | - Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting(Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.14169) 621 | - ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2411.19548) [[Github]](https://github.com/GigaAI-research/ReconDreamer) [[Project]](https://recondreamer.github.io/) 622 | - EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.04380) [[Github]](https://github.com/YkiWu/EmbodiedOcc) 623 | - PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.07616) 624 | - Fast Occupancy Network (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.07163) 625 | - Lightweight Spatial Embedding for Vision-based 3D Occupancy Prediction (ARxiv 2024) [[Paper]](https://arxiv.org/abs/2412.05976) 626 | - doScenes: An Autonomous Driving Dataset with Natural Language Instruction for Human Interaction and Vision-Language Navigation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.05893) [[Github]](https://github.com/rossgreer/doScenes) 627 | - LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba (AAAI 2025) [[paper]](https://arxiv.org/abs/2412.08388) 628 | - GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.10373) [[Github]](https://github.com/zuosc19/GaussianWorld) 629 | - ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction (AAAI 2025) [[Paper]](https://arxiv.org/abs/2412.11210) [[Github]](https://github.com/fengyi233/ViPOcc) [[Github]](https://mias.group/ViPOcc/) 630 | - OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.11183) 631 | - ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder (AAAI 2025 Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.08774) [[Paper]](https://github.com/SPA-junghokim/ProtoOcc) 632 | - MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.20480) 633 | - Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion (AAAI 2025) [[Paper]](https://arxiv.org/pdf/2501.07260) 634 | - Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2501.15394) 635 | - MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2501.15384) [[Github]](https://github.com/LucasYang567/MetaOcc) 636 | - OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2502.04981) 637 | - MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring (Arxiv 2025) [[paper]](https://arxiv.org/abs/2502.11287) 638 | - OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving (ARxiv 2025) [[Paper]](https://arxiv.org/abs/2502.14235) 639 | - LEAP: Enhancing Vision-Based Occupancy Networks with Lightweight Spatio-Temporal Correlation (Arxiv 2025) [[Paper]]](https://arxiv.org/abs/2502.15438) 640 | - OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework (Arxiv 2025) [[paper]](https://arxiv.org/abs/2502.15180) 641 | - GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.17288) 642 | - H3O: Hyper-Efficient 3D Occupancy Prediction with Heterogeneous Supervision (ICRA 2025) [[Paper]](https://arxiv.org/abs/2503.04059) 643 | - TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.08485) 644 | - OCCUQ: Exploring Efficient Uncertainty Quantification for 3D Occupancy Prediction (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.10605) [[Github]](https://github.com/ika-rwth-aachen/OCCUQ) 645 | - L2COcc: Lightweight Camera-Centric Semantic Scene Completion via Distillation of LiDAR Model (ARxiv 2025) [[Paper]](https://arxiv.org/abs/2503.12369) [[Project]](https://studyingfufu.github.io/L2COcc/) [[Github]](https://github.com/StudyingFuFu/L2COcc/tree/master) 646 | - 3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation (CVPR 2025) [[Paper]](https://arxiv.org/abs/2503.15185) 647 | - SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.16399) 648 | #### Occupancy Challenge 649 | - FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation (CVPR 2023 3D Occupancy Prediction Challenge WorkShop) [[paper]](https://arxiv.org/pdf/2307.01492.pdf) [[Github]](https://github.com/NVlabs/FB-BEV) 650 | - Separated RoadTopoFormer (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.01557.pdf) 651 | - OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios (CVPR 2023 WorkShop) [[Paper]](https://arxiv.org/pdf/2307.10934.pdf) [[Github]](https://drive.google.com/file/d/1IFUxbx1hI7iA7uXxilfq-Z0JXMGEU2Zb/view) 652 | - AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction (CVPR 2024 Workshop) [[Paper]](https://arxiv.org/pdf/2407.01436) 653 | - Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.13155) 654 | ### Challenge 655 | - The 1st-place Solution for CVPR 2023 OpenLane Topologyin Autonomous Driving Challenge [[Paper]](https://arxiv.org/pdf/2306.09590.pdf) 656 | - MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report (CVPR 2024 Challenge) [[Paper]](https://arxiv.org/abs/2406.10125) 657 | ### Dataset 658 | - Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark (CVPR 2023) [[paper]](https://arxiv.org/pdf/2212.08914.pdf) [[Github]](https://github.com/JeffWang987/ASAP) 659 | - SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions (IV 2024) [[Paper]](https://arxiv.org/abs/2406.09945) [[Project]](https://semantic-spray-dataset.github.io/) [[Github]](https://github.com/uulm-mrm/semantic_spray_dataset) 660 | - WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2407.08280) [[Project]](https://wayve.ai/science/wayvescenes101/) [[Github]](https://github.com/wayveai/wayve_scenes) 661 | - WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2410.15792) [[Github]](https://github.com/LedKashmir/WildOcc) 662 | ### World Model 663 | - End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2306.16927.pdf) [[Github]](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving) 664 | - Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving (ICRA 2024) [[paper]](https://arxiv.org/pdf/2310.02251.pdf) [[Github]](https://github.com/llmbev/talk2bev) [[Project]](https://llmbev.github.io/talk2bev/) 665 | - Language Prompt for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/abs/2309.04379) [[Github]](https://github.com/wudongming97/Prompt4Driving) 666 | - MotionLM: Multi-Agent Motion Forecasting as Language Modeling (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.16534.pdf) 667 | - GAIA-1: A Generative World Model for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.17080.pdf) 668 | - DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.09777.pdf) 669 | - Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.01957.pdf) [[Github]](https://github.com/wayveai/driving-with-llms) 670 | - Learning to Drive Anywhere (CORL 2023) [[Paper]](https://arxiv.org/pdf/2309.12295.pdf) 671 | - Language-Conditioned Path Planning (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.16893.pdf) 672 | - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model (Arxiv 2023) [[Paper]](https://browse.arxiv.org/pdf/2310.01412.pdf) [[Project]](https://tonyxuqaq.github.io/projects/DriveGPT4/) 673 | - GPT-Driver: Learning to Drive with GPT (Arxiv 2023) [[Paper]](https://browse.arxiv.org/pdf/2310.01415.pdf) 674 | - LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/abs/2310.03026) 675 | - TOWARDS END-TO-END EMBODIED DECISION MAKING VIA MULTI-MODAL LARGE LANGUAGE MODEL: EXPLORATIONS WITH GPT4-VISION AND BEYOND (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.02071.pdf) 676 | - DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.07771.pdf) 677 | - UNIPAD: A UNIVERSAL PRE-TRAINING PARADIGM FOR AUTONOMOUS DRIVING (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.08370.pdf) [[Github]](https://github.com/Nightmare-n/UniPAD) 678 | - PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.08586.pdf) 679 | - Uni3D: Exploring Unified 3D Representation at Scale (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.06773.pdf) [[Github]](https://github.com/baaivision/Uni3D) 680 | - Video Language Planning (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.10625.pdf) [[Github]](https://video-language-planning.github.io/) 681 | - RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.10221.pdf) 682 | - DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.12128.pdf) [[Paper]](https://github.com/aszala/DiagrammerGPT) [[Project]](https://diagrammergpt.github.io/) 683 | - Vision Language Models in Autonomous Driving and Intelligent Transportation Systems (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.14414.pdf) 684 | - ADAPT: Action-aware Driving Caption Transformer (ICRA 2023) [[Paper]](https://arxiv.org/pdf/2302.00673.pdf) [[Github]](https://github.com/jxbbb/ADAPT) 685 | - Language Prompt for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.04379.pdf) [[Github]](https://github.com/wudongming97/Prompt4Driving) 686 | - Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.17642.pdf) [[Project]](https://drive-anywhere.github.io/) 687 | - LEARNING UNSUPERVISED WORLD MODELS FOR AUTONOMOUS DRIVING VIA DISCRETE DIFFUSION (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.01017.pdf) 688 | - ADriver-I: A General World Model for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.13549.pdf) 689 | - HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.05186.pdf) 690 | - On the Road with GPT-4V(vision): Early Explorations of Visual-Language Model on Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.05332.pdf) 691 | - GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.12631.pdf) 692 | - Applications of Large Scale Foundation Models for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2311.12144v5.pdf) 693 | - Dolphins: Multimodal Language Model for Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.00438.pdf) [[Project]](https://vlm-driver.github.io/) 694 | - Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.17918.pdf) [[Github]](https://github.com/BraveGroup/Drive-WM) [[Project]](https://drive-wm.github.io/) 695 | - Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.03031.pdf) [[Github]](https://github.com/NVlabs/BEV-Planner) 696 | - NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.06352.pdf) [[Github]](https//github.com/turingmotors/NuScenes-MQA) 697 | - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.09245.pdf) [[[Github]](https://github.com/OpenGVLab/DriveMLM) 698 | - DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.07920.pdf) [[Project]](https://pkuvdig.github.io/DrivingGaussian/) 699 | - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.03661.pdf) [[Github]](https://github.com/fudan-zvg/Reason2Drive) 700 | - Dialogue-based generation of self-driving simulation scenarios using Large Language Models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2310.17372.pdf) [[Github]](https://github.com/avmb/dialogllmscenic?tab=readme-ov-file) 701 | - Panacea: Panoramic and Controllable Video Generation for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.16813.pdf) [[Project]](https://panacea-ad.github.io/) [[Github]](https://github.com/wenyuqing/panacea) 702 | - LingoQA: Video Question Answering for Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.14115.pdf) [[Github]](https://github.com/wayveai/LingoQA/) 703 | - DriveLM: Driving with Graph Visual Question Answering (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.14150.pdf) [[Github]](https://github.com/OpenDriveLab/DriveLM) 704 | - LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.14074.pdf) [[Project]](https://sites.google.com/view/lidar-llm) 705 | - LMDrive: Closed-Loop End-to-End Driving with Large Language Models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.07488v2.pdf) [[Github]](https://github.com/opendilab/LMDrive) 706 | - Visual Point Cloud Forecasting enables Scalable Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.17655.pdf) [[Github]](https://github.com/OpenDriveLab/ViDAR) 707 | - WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.02934.pdf) [[Github]](https://github.com/fudan-zvg/WoVoGen) 708 | - Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.00988.pdf) [[Github]](https://github.com/xmed-lab/NuInstruct) 709 | - DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.03641.pdf) 710 | - A Survey on Multimodal Large Language Models for Autonomous Driving (WACVW 2024) [[Paper]](https://openaccess.thecvf.com/content/WACV2024W/LLVM-AD/papers/Cui_A_Survey_on_Multimodal_Large_Language_Models_for_Autonomous_Driving_WACVW_2024_paper.pdf) 711 | - VLP: Vision Language Planning for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2401.05577.pdf) 712 | - Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.08045.pdf) 713 | - MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.07314.pdf) 714 | - Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2402.05746.pdf) 715 | - DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.12289) [[Github]](https://tsinghua-mars-lab.github.io/DriveVLM/) 716 | - GenAD: Generative End-to-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.11502) [[Github]](https://arxiv.org/abs/2402.11502) 717 | - Generalized Predictive Model for Autonomous Driving (CVPR 2024) [[Paper]](https://arxiv.org/abs/2403.09630) 718 | - AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.13331) 719 | - DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.16996) 720 | - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.19438) [[Project]](https://subjectdrive.github.io/) 721 | - DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.06845) [[Project]](https://drivedreamer2.github.io/) [[Github]](https://github.com/f1yfisher/DriveDreamer2) 722 | - DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (ICLR 2024) [[Paper]](https://arxiv.org/abs/2309.16292) [[Paper]](https://pjlab-adg.github.io/DiLu/) 723 | - OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.01533) 724 | - GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.00515) 725 | - Guiding Attention in End-to-End Driving Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.00242) 726 | - Probing Multimodal LLMs as World Models for Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.05956) 727 | - Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.04909) 728 | - Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (Arixv 2024) [[Paper]](https://arxiv.org/abs/2405.05258) 729 | - Unified End-to-End V2X Cooperative Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.03971) 730 | - DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.04390) 731 | - OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.01533) 732 | - GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2405.00515) 733 | - MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2405.07573) 734 | - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.14475) 735 | - Language-Image Models with 3D Understanding (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2405.03685) [[Project]](https://janghyuncho.github.io/Cube-LLM/) 736 | - Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.18361) 737 | - GFlow: Recovering 4D World from Monocular Video (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.18426) [[Github]](https://littlepure2333.github.io/GFlow/) 738 | - Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.17426) [[Github]](https://github.com/Daniel-xsy/RoboBEV) 739 | - Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2405.17398) [[Github]](https://github.com/OpenDriveLab/Vista) [[Project]](https://vista-demo.github.io/) 740 | - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.20337) [[Github]](https://github.com/wzzheng/OccSora) [[Project]](https://wzzheng.net/OccSora/) 741 | - DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.03008) [[Github]](https://sled-group.github.io/driVLMe/) 742 | - AD-H: Autonomous Driving with Hierarchical Agents (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.03474) 743 | - Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.03877) [[Github]](https://thinklab-sjtu.github.io/Bench2Drive/) 744 | - A Superalignment Framework in Autonomous Driving with Large Language Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.05651) 745 | - Enhancing End-to-End Autonomous Driving with Latent World Model (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.08481) 746 | - SimGen: Simulator-conditioned Driving Scene Generation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2406.09386) 747 | - Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset (Arxiv 2024) [[paper]](https://arxiv.org/abs/2406.09383) [[Project]](https://ai4ce.github.io/MARS/) 748 | - WonderWorld: Interactive 3D Scene Generation from a Single Image (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2406.09394) 749 | - CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2406.10165) 750 | - End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2406.17680) 751 | - CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2406.10165) 752 | - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space (Arxiv 2024) [[Paper]](https://github.com/zympsyche/BevWorld) [[Github]](https://arxiv.org/pdf/2407.05679) 753 | - Exploring the Causality of End-to-End Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2407.06546) [[Github]](https://github.com/bdvisl/DriveInsight) 754 | - SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.21293) 755 | - DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.00415) [[Github]](https://github.com/PJLab-ADG/DriveArena) 756 | - Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.03516) 757 | - Open 3D World in Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.10880) 758 | - CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.10845) 759 | - Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2408.14197) 760 | - DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.16647) 761 | - OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.03272) 762 | - Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.02914) 763 | - ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models (ITSC 2024) [[Paper]](https://arxiv.org/abs/2409.00301) 764 | - MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2409.07267) 765 | - RenderWorld: World Model with Self-Supervised 3D Label (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.11356) 766 | - Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.11182) 767 | - DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.12753) [[Project]](https://fangzhou2000.github.io/projects/drivingforward/) [[Github]](https://github.com/fangzhou2000/DrivingForward) 768 | - METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2409.12667) 769 | - DOES END-TO-END AUTONOMOUS DRIVING REALLY NEED PERCEPTION TASKS? (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.18341) 770 | - Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.18218) 771 | - Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models (Arxiv 2024) [[paper]](https://arxiv.org/abs/2410.01144) 772 | - ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding (Arxiv) [[Paper]](https://arxiv.org/abs/2410.00982) 773 | - Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2409.18218) 774 | - HE-Drive: Human-Like End-to-End Driving with Vision Language Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.05051) [[Project]](https://jmwang0117.github.io/HE-Drive/) [[Paper]](https://github.com/jmwang0117/HE-Drive) 775 | - UniDrive: Towards Universal Driving Perception Across Camera Configurations (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.13864) [[Github]](https://github.com/ywyeli/UniDrive) 776 | - DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2410.13571) [[Github]](https://github.com/GigaAI-research/DriveDreamer4D) [[Project]](https://drivedreamer4d.github.io/) 777 | - DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model (NeurIPS 2024) [[Paper]](https://arxiv.org/abs/2410.10738) [[Project]](https://drivingdojo.github.io/) [[Github]](https://github.com/Robertwyq/Drivingdojo) 778 | - EMMA: End-to-End Multimodal Model for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.23262) [[Github]](https://waymo.com/blog/2024/10/introducing-emma/) 779 | - Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.22313) [[Github]](https://github.com/hustvl/Senna) 780 | - MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.13807) [[Github]](https://github.com/flymin/MagicDriveDiT) [[Project]](https://gaoruiyuan.com/magicdrivedit/) 781 | - DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2411.15139) [[Github]](https://github.com/hustvl/DiffusionDrive) 782 | - VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.14716) 783 | - Language Driven Occupancy Prediction (Arxiv 2024) [[paper]](https://arxiv.org/abs/2411.16072) [[Github]](https://github.com/pkqbajng/LOcc) 784 | - Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.03520) [[Project]](https://luhannan.github.io/CogDrivingPage/) 785 | - InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.03934) [[Github]](https://research.nvidia.com/labs/toronto-ai/infinicube/) 786 | - Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.05280) [[Github]](https://github.com/wzzheng/Stag) 787 | - UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.04842) 788 | - Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.06777) [[Github]](https://github.com/Barrybarry-Smith/Driv3R) 789 | - DriveMM: All-in-One Large Multimodal Model for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.07689) [[Github]](https://github.com/zhijian11/DriveMM) 790 | - Physical Informed Driving World Model (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.08410) [[Github]](https://metadrivescape.github.io/papers_project/DrivePhysica/page.html) 791 | - GPD-1: Generative Pre-training for Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.08643) [[Github]](https://github.com/wzzheng/GPD) 792 | - Doe-1: Closed-Loop Autonomous Driving with Large World Model (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.09627) [[Github]](https://github.com/wzzheng/Doe) 793 | - GaussianAD: Gaussian-Centric End-to-End Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.10371) [[Github]](https://github.com/wzzheng/GaussianAD) 794 | - DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.09043) [[Github]](https://github.com/EnVision-Research/DriveRecon) 795 | - SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout (NeurIPS 2024) [[paper]](https://arxiv.org/pdf/2412.12129) 796 | - Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.14058) [[Github]](https://robovlms.github.io/) 797 | - An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.13772) 798 | - OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.15208) [[paper]](https://github.com/taco-group/OpenEMMA) 799 | - AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.15206) [[Github]](https://github.com/taco-group/AutoTrust) 800 | - VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.14446) 801 | - DriveGPT: Scaling Autoregressive Behavior Models for Driving (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.14415) 802 | - DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.18607) 803 | - UniPLV: Towards Label-Efficient Open-World 3D Scene Understanding by Regional Visual Language Supervision (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2412.18131) 804 | - DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.19505) [[Github]](https://github.com/YvanYin/DrivingWorld) 805 | - AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2501.04969) [[Github]](https://github.com/HaoranZhuExplorer/AD-L-JEPA-Release) 806 | - DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2501.04671) 807 | - Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives (Arxiv 2025) [[paper]](https://arxiv.org/abs/2501.04003) [[Github]](https://drive-bench.github.io/) 808 | - Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving (Arxiv 2025) [[paper]](https://arxiv.org/abs/2501.06680) 809 | - Distilling Multi-modal Large Language Models for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2501.09757) 810 | - A Survey of World Models for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2501.11260) 811 | - HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation (Arxiv 2025) [[paper]](https://arxiv.org/abs/2501.14729) [[Paper]](https://github.com/LMD0311/HERMES) 812 | - SSF: Sparse Long-Range Scene Flow for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2501.17821) 813 | - V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.09980) 814 | - MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction (Arxiv 2025) [[paper]](https://arxiv.org/abs/2502.11663) 815 | - The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.10498) 816 | - Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.14917) 817 | - VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.18042) 818 | - VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2502.20108) 819 | - FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering (Arxiv 2025) [[Paper]](FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering) 820 | - BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.03074) 821 | - GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.05689) [[Github]](https://github.com/YvanYin/GoalFlow) 822 | - AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.07608) [[Github]](https://github.com/hustvl/AlphaDrive) 823 | - CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.07234) 824 | - CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.08683) [[Github]](https://github.com/cxliu0314/CoLMDriver) 825 | - HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder (Arxiv 2025) [[paper]](https://arxiv.org/abs/2503.08612) 826 | - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.07656) 827 | - SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.09594) 828 | - Post-interactive Multimodal Trajectory Prediction for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.09366) 829 | - Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.09464) 830 | - DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.10621) [[Github]](https://github.com/ayesha-ishaq/DriveLMM-o1) 831 | - Unlock the Power of Unlabeled Data in Language Driving Model (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.10586) 832 | - Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.10434) 833 | - DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.11265) 834 | - Active Learning from Scene Embeddings for End-to-End Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.11062) 835 | - Centaur: Robust End-to-End Autonomous Driving with Test-Time Training (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.11650) 836 | - InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.13047) [[Paper]](https://github.com/songruiqi/InsightDrive) 837 | - Hydra-MDP++: Advancing End-to-End Driving via Expert-Guided Hydra-Distillation (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.12820) 838 | - Tracking Meets Large Multimodal Models for Driving Scenario Understanding (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.14498) 839 | - ChatBEV: A Visual Language Model that Understands BEV Maps (Arxiv 2025) [[paper]](https://arxiv.org/abs/2503.13938) 840 | - RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving (Arxiv 2025) [[paper]](https://arxiv.org/abs/2503.13861) 841 | - Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning (CVPR 2025) [[Paper]](https://arxiv.org/abs/2503.14182) 842 | - GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.20523) 843 | ### Other 844 | - LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2501.04005) [[Github]](https://ldkong.com/LargeAD) 845 | - X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.01123) [[Github]](https://arxiv.org/abs/2411.01123) 846 | - Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views (AAAI 2021) [[Paper]](https://arxiv.org/pdf/2010.01191.pdf) [[Github]](https://github.com/vincentcartillier/Semantic-MapNet) [[Project]](https://vincentcartillier.github.io/smnet.html) 847 | - Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers (WACV 2023) [[Paper]](Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers) 848 | - ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view (Arxiv 2022) [[paper]](https://arxiv.org/pdf/2210.06332.pdf) 849 | - 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.11910.pdf) [[Github]](https://github.com/jamycheung/360BEV) [[Project]](https://jamycheung.github.io/360BEV.html) 850 | - F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.03651.pdf) 851 | - NVAutoNet: Fast and Accurate 360∘ 3D Visual Perception For Self Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2303.12976.pdf) 852 | - FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.01534.pdf) 853 | - Aligning Bird-Eye View Representation of PointCloud Sequences using Scene Flow (IEEE IV 2023) [[Paper]](https://arxiv.org/pdf/2305.02909.pdf) [[Github]](https://github.com/quan-dao/pc-corrector) 854 | - MotionBEV: Attention-Aware Online LiDARMoving Object Segmentation with Bird’s Eye Viewbased Appearance and Motion Features (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.07336.pdf) 855 | - WEDGE: A multi-weather autonomous driving dataset built from generativevision-language models (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.07528.pdf) [[Github]](https://github.com/Infernolia/WEDGE) [[Project]](https://infernolia.github.io/WEDGE/) 856 | - Leveraging BEV Representation for360-degree Visual Place Recognition (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.13814.pdf) 857 | - NMR: Neural Manifold Representation for Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2205.05551.pdf) 858 | - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022) [[Paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136990106.pdf) [[Github]](https://github.com/DerrickXuNu/v2x-vit) 859 | - DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative3D Object Detection (CVPR 2022) [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Yu_DAIR-V2X_A_Large-Scale_Dataset_for_Vehicle-Infrastructure_Cooperative_3D_Object_Detection_CVPR_2022_paper.pdf) [[Github]](https://github.com/AIR-THU/DAIR-V2X) 860 | - Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task (CVPR 2022) [[Paper]](https://arxiv.org/pdf/2203.13608.pdf) [[Github]](https://github.com/liyingying0113/rope3d-dataset-tools) [[Project]](https://thudair.baai.ac.cn/rope) 861 | - A Motion and Accident Prediction Benchmark for V2X Autonomous Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2304.01168.pdf) [[Project]](https://deepaccident.github.io/) 862 | - BEVBert: Multimodal Map Pre-training for Language-guided Navigation (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2212.04385.pdf) 863 | - V2X-Seq: A Large-Scale Sequential Dataset forVehicle-Infrastructure Cooperative Perception and Forecasting (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2305.05938.pdf) [[Github]](https://github.com/AIR-THU/DAIR-V2X-Seq) [[Project]](https://thudair.baai.ac.cn/index) 864 | - BUOL: A Bottom-Up Framework with Occupancy-aware Lifting forPanoptic 3D Scene Reconstruction From A Single Image (CVPR 2023) [[paper]](https://arxiv.org/pdf/2306.00965.pdf) [[Github]](https://github.com/chtsy/buol) 865 | - BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird’s-Eye-View in Dynamic Scenarios (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2306.11598.pdf) 866 | - Bird’s-Eye-View Scene Graph for Vision-Language Navigation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.04758.pdf) 867 | - OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2310.13398.pdf) 868 | - Hidden Biases of End-to-End Driving Models (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2306.07957.pdf) [[Github]][https://github.com/autonomousvision/carla_garage] 869 | - EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2307.08991.pdf) 870 | - End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2306.16927.pdf) [[Github]](https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving) 871 | - BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View Images (ICCV 2023) [[paper]](https://arxiv.org/pdf/2302.14325.pdf) 872 | - I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through Bird’s Eye View Projections (IROS 2023) [[Paper]](https://arxiv.org/pdf/2303.01043.pdf) 873 | - Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.01471.pdf) [[Project]](https://waabi.ai/research/implicito) 874 | - BEV-DG: Cross-Modal Learning under Bird’s-Eye View for Domain Generalization of 3D Semantic Segmentation (ICCV 2023) [[paper]](https://arxiv.org/pdf/2308.06530.pdf) 875 | - MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.12963.pdf) [[Github]](https://github.com/xiyuez2/MapPrior) [[Project]](https://mapprior.github.io/) 876 | - Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [[Paper]](https://arxiv.org/pdf/2007.09547.pdf) [[Github]](https://github.com/songtaohe/Sat2Graph) 877 | - Occ2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions (ICCV 2023) [[Paper]](https://arxiv.org/pdf/2308.16160.pdf) 878 | - QUEST: Query Stream for Vehicle-Infrastructure Cooperative Perception (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.01804.pdf) 879 | - Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.15427.pdf) 880 | - SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping an Building Change Detection (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.01907.pdf) 881 | - Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.05731.pdf) 882 | - BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2401.01065.pdf) 883 | - BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.02136.pdf) 884 | - Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2308.16714.pdf) 885 | - PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.04501.pdf) 886 | - BEVTrack: A Simple Baseline for 3D Single Object Tracking in Birds's-Eye-View (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2309.02185.pdf) [[Github]](https://github.com/xmm-prio/BEVTrack) 887 | - BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.15363.pdf) 888 | - UC-NERF: NEURAL RADIANCE FIELD FOR UNDER-CALIBRATED MULTI-VIEW CAMERAS IN AUTONOMOUS DRIVING (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2311.16945.pdf) [[Project]](https://kcheng1021.github.io/ucnerf.github.io/) [[Github]](https://github.com/kcheng1021/UC-NeRF) 889 | - All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.12176.pdf) 890 | - BEVSeg2TP: Surround View Camera Bird’s-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.13081.pdf) 891 | - BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2308.01661.pdf) 892 | - EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI (Arxiv 2023) [[Paper]](https://arxiv.org/pdf/2312.16170.pdf) [[Github]](https://github.com/OpenRobotLab/EmbodiedScan) 893 | - A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2309.11754.pdf) 894 | - C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation (Arxiv 2023) [[paper]](https://arxiv.org/pdf/2312.08060.pdf) 895 | - Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.11499.pdf) 896 | - GeoDecoder: Empowering Multimodal Map Understanding (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2401.15118.pdf) 897 | - Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird’s-Eye-View (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2402.00637.pdf) 898 | - Text2Street: Controllable Text-to-image Generation for Street Views (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2402.04504.pdf) 899 | - Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2402.13848) 900 | - EV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.06600) 901 | - OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [[paper]](https://arxiv.org/abs/2403.11796) 902 | - Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2407.12803) 903 | - Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.13759) [[Github]](https://boyangdeng.com/streetscapes/) 904 | - M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2403.12552) 905 | - MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2405.01413) 906 | - Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.06861) 907 | - MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2407.08561) 908 | - Neural Semantic Map-Learning for Autonomous Vehicles (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2410.07780) 909 | - AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [[Paper]](Arxiv 2024) [[paper]](https://arxiv.org/pdf/2407.02598) [[Project]](https://autosplat.github.io/) 910 | - MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability (Arxiv 2024) [[paper]](https://arxiv.org/abs/2407.19468) [[Github]](https://github.com/kkaiwwana/MVPbev) 911 | - SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.01812) [[Github]](https://opendatalab.github.io/skydiffusion/) 912 | - UrbanWorld: An Urban World Model for 3D City Generation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2407.11965) 913 | - From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model (ICRA 2024) [[Paper]](https://arxiv.org/abs/2409.01014) 914 | - Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2410.07268) 915 | - DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2409.05463) [[Project]](https://metadrivescape.github.io/papers_project/drivescapev1/index.html) 916 | - BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autononomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2408.16322) 917 | - Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2412.09647) 918 | - RAC3: Retrieval-Augmented Corner Case Comprehension for Autonomous Driving with Vision-Language Models (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2412.10734) 919 | - OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving (Arxiv 2024) [[paper]](https://arxiv.org/pdf/2412.10734) 920 | - Hidden Biases of End-to-End Driving Datasets (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.09602) [[Github]](https://github.com/autonomousvision/carla_garage) 921 | - Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2411.13626) 922 | - VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2412.15544) [[Project]](https://www.huang-zilin.com/VLM-RL-website/) [[Github]](https://github.com/zihaosheng/VLM-RL) 923 | - OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/pdf/2412.17226) [[Project]](https://yanty123.github.io/OLiDM/) [[Github]](https://github.com/yanty123/OLiDM) 924 | - DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes (Arxiv 2024) [[paper]](https://arxiv.org/abs/2412.19458) [[Project]](https://yvanliang.github.io/DriveEditor/) [[Github]](https://github.com/yvanliang/DriveEditor) 925 | - HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving (Arxiv 2024) [[Paper]](https://arxiv.org/abs/2412.01407) 926 | - Joint Perception and Prediction for Autonomous Driving: A Survey (Arxiv 2024) [[paper]](Joint Perception and Prediction for Autonomous Driving: A Survey) 927 | - 3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous Driving (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2501.14605) 928 | - Range and Bird’s Eye View Fused Cross-Modal Visual Place Recognition (Arxiv 2025) [[paper]](https://arxiv.org/pdf/2502.11742) 929 | - Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.11091) 930 | - BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model (Arxiv 2025) [[Paper]](https://arxiv.org/abs/2503.11372) [[Github]](https://github.com/nubot-nudt/BEVDiffLoc) 931 | - Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments (Arxiv 2025) [[Paper]](https://arxiv.org/pdf/2503.22496) [[Project]](https://princeton-computational-imaging.github.io/scenario-dreamer/) [[Github]](https://github.com/princeton-computational-imaging/scenario-dreamer) 932 | - RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation (Arxiv 2025) [[paper]](https://arxiv.org/abs/2502.14792) 933 | --------------------------------------------------------------------------------