└── README.md


/README.md:
--------------------------------------------------------------------------------
   1 | # Collaborative / Cooperative Perception Datasets - Updating
   2 | ## Overview
   3 | 
   4 | This repository consolidates **Collaborative Perception (CP)** datasets for autonomous driving, covering a wide range of communication paradigms, including pure **roadside perception**, **Vehicle-to-Vehicle (V2V)**, **Vehicle-to-Infrastructure (V2I)**, **Vehicle-to-Everything (V2X)**, and **Infrastructure-to-Infrastructure (I2I)** scenarios. It includes nearly all publicly available **CP** datasets and provides links to relevant publications, source code, and dataset downloads, offering researchers an efficient and centralized resource to aid their research and development in this field.
   5 | 
   6 | First, the repository introduces commonly used **autonomous driving simulation tools**, followed by categorizing **CP datasets** based on collaboration paradigms, presented in a tabular format. Each dataset is then described in detail, helping readers better understand the characteristics and applicable scenarios of each dataset. In addition, the repository also consolidates classic methods and cutting-edge research in **collaborative perception**, providing valuable insights into current trends and future directions in the field.
   7 | 
   8 | ### :link:Jump to:
   9 | - [Simulator](#simulator-anchor)
  10 | - [Roadside Datasets](#roadside-datasets-anchor)
  11 | - [V2V Datasets](#v2v-datasets-anchor)
  12 | - [V2I Datasets](#v2i-datasets-anchor)
  13 | - [V2X Datasets](#v2x-datasets-anchor)
  14 | - [I2I Datasets](#i2i-datasets-anchor)
  15 | - [Methods](#methods-anchor)
  16 | - [Citation](#citation-anchor)
  17 | 
  18 | 
  19 | ## :bookmark:Simulator  <a id="simulator-anchor"></a>
  20 | 
  21 | - Simulators for Collaborative Perception (CP) Research
  22 | 
  23 | **Simulators** play a critical role in **Collaborative Perception (CP)** research for **Autonomous Driving (AD)**, offering cost-effective, safe, and annotation-efficient alternatives to real-world data collection. They enable the generation of **accurate object attributes** and provide **automated annotations**, crucial for training and evaluating perception algorithms.
  24 | 
  25 | Several **open-source platforms** are widely used for data synthesis:
  26 | 
  27 | | **Simulator** | **Year** | **Venue** |
  28 | |---------------|----------|-----------|
  29 | | CARLA         | 2017     | Conference on Robot Learning (CoRL) |
  30 | | SUMO          | 2002     | Proceedings of the 4th Middle East Symposium on Simulation and Modelling (MESM2002) |
  31 | | AirSim        | 2018     | Computer Vision and Pattern Recognition (CVPR) |
  32 | | OpenCDA       | 2021     | IEEE International Conference on Intelligent Transportation Systems (ITSC) |
  33 | 
  34 | ### CARLA: An Open Urban Driving Simulator [[paper](https://arxiv.org/abs/1711.03938)] [[code](https://github.com/carla-simulator/carla)] [[project](https://carla.org)]
  35 | 
  36 | - **Background and Motivation**  
  37 | The paper introduces CARLA, an open-source simulator specifically designed for autonomous driving research. Traditional testing for autonomous vehicles in urban environments is costly and logistically challenging. Simulations offer an affordable alternative, enabling testing of various autonomous driving models, including perception, control, and navigation in complex urban settings. CARLA was developed to address the challenges in simulating real-world scenarios such as dynamic traffic, pedestrians, weather conditions, and other urban obstacles, making it essential for advancing autonomous driving research.
  38 | 
  39 | - **Key Contributions**  
  40 |   - **Open-Source Platform**: CARLA is an open-source urban driving simulator, offering free access to both the code and digital assets.
  41 |   - **Flexible Sensor and Environmental Setup**: It allows for customizable sensor suites (e.g., RGB cameras, LiDAR) and a variety of environmental conditions (weather, time of day).
  42 |   - **Realistic Urban Environment**: CARLA provides a highly detailed and dynamic urban environment with realistic traffic, pedestrians, and infrastructure.
  43 |   - **Simulation of Three Approaches**: The paper tests and compares three approaches to autonomous driving: a modular pipeline, imitation learning, and reinforcement learning, providing valuable insights into their performance in controlled urban scenarios.
  44 |   - **Evaluation Metrics**: The simulation framework provides comprehensive metrics for performance evaluation, facilitating in-depth analysis of driving policies.
  45 | 
  46 | 
  47 | ### SUMO: Simulation of Urban MObility [[paper](https://elib.dlr.de/6661/2/dkrajzew_MESM2002.pdf)] [~~code~~] [[project](https://elib.dlr.de/6661/)]
  48 | 
  49 | - **Background and Motivation**
  50 | Traffic simulation plays a crucial role in understanding and improving urban mobility, given the complexity and unpredictability of real-world traffic. Existing traffic models often fail to account for the variability introduced by individual driver behavior, diverse transportation modes, and changing traffic conditions. To address these challenges, open-source simulation tools are essential to enable researchers to model and analyze traffic behavior more effectively.
  51 | 
  52 | - **Key Contributions**
  53 |   - **Open-Source Traffic Simulation**: SUMO is an open-source, microscopic, and multi-modal traffic simulation platform that enables detailed simulation of urban mobility, including car movements, public transportation, and pedestrian traffic.
  54 |   - **Comprehensive Traffic Flow Modeling**: The platform uses continuous modeling to represent vehicle movements and interactions, incorporating factors like driver behavior and road network conditions.
  55 |   - **Extensibility and Flexibility**: SUMO offers a customizable framework where researchers can integrate their own models and traffic scenarios, allowing for an adaptable and scalable research tool.
  56 |   - **Support for Various Modalities**: SUMO supports multi-modal simulations, including cars, buses, trams, and pedestrian pathways, to simulate complex urban environments.
  57 |   - **Simulation of Large-Scale Networks**: SUMO is capable of simulating large urban areas and can handle high volumes of traffic data, making it suitable for analyzing traffic management strategies in large cities. 
  58 | 
  59 | ### AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles [[paper](https://link.springer.com/chapter/10.1007/978-3-319-67361-5_40)]  [~~code~~] [~~project~~]
  60 | 
  61 | - **Background and Motivation**  
  62 | Testing autonomous vehicles in real-world environments is both costly and time-consuming. Additionally, collecting the large datasets required for training machine learning algorithms can be challenging and impractical. AirSim, built on Unreal Engine, aims to address these issues by offering a high-fidelity simulation platform that supports real-time, hardware-in-the-loop simulations. It is designed to provide realistic physical and visual simulations to aid the development of autonomous vehicles, enabling safe and cost-effective testing in various environments.
  63 | 
  64 | - **Key Contributions**  
  65 |   - **High-Fidelity Simulation**: AirSim offers a realistic platform for simulating both the visual and physical environments of autonomous vehicles.
  66 |   - **Extensibility**: The platform is highly modular and extensible, supporting a variety of vehicles, sensors, and hardware platforms.
  67 |   - **Realistic Physics and Sensor Models**: AirSim simulates complex environmental factors such as gravity, air pressure, and magnetic fields, along with advanced sensor models for IMU, GPS, and barometers.
  68 |   - **Integration with Unreal Engine**: By leveraging Unreal Engine, AirSim supports realistic rendering, including photorealistic graphics, object segmentation, and depth sensing.
  69 |   - **Vehicle Model and Simulation**: AirSim includes a vehicle model capable of simulating a range of vehicle types, from ground vehicles to aerial drones, with detailed dynamics and control mechanisms.
  70 | 
  71 | ### OpenCDA: An Open Cooperative Driving Automation Framework Integrated with Co-Simulation [[paper](https://ieeexplore.ieee.org/abstract/document/9564825)] [[code](https://github.com/ucla-mobility/OpenCDA)] [[doc](https://opencda-documentation.readthedocs.io/en/latest)]
  72 | 
  73 | - **Background and Motivation**  
  74 | Cooperative Driving Automation (CDA) is gaining attention but faces significant challenges, particularly the lack of simulation platforms that support multi-vehicle cooperation. Current simulators primarily focus on single-vehicle automation, hindering the evaluation and comparison of CDA algorithms in a collaborative setting. OpenCDA was developed to bridge this gap, providing a flexible and modular tool for testing CDA algorithms in both traffic-level and individual vehicle scenarios.
  75 | 
  76 | - **Key Contributions**  
  77 |   - **Co-Simulation Platform**: OpenCDA integrates various simulators (e.g., CARLA, SUMO) to support both vehicle and traffic-level simulations for cooperative driving tasks.
  78 |   - **Modular Design**: The framework is highly modular, allowing users to replace default algorithms with their own customized designs for different CDA applications.
  79 |   - **Full-Stack CDA System**: OpenCDA provides a complete system including sensing, computation, and actuation modules, along with cooperative features like vehicle communication and information sharing.
  80 |   - **Benchmarking and Testing**: It includes a benchmark testing database, offering standard scenarios and evaluation metrics for comparing CDA algorithms.
  81 |   - **Platooning Example**: The paper demonstrates the capabilities of OpenCDA through a platooning implementation, showcasing its flexibility and effectiveness in CDA research.
  82 | 
  83 | 
  84 | ## :bookmark:Roadside Datasets  <a id="roadside-datasets-anchor"></a>
  85 | ### Table
  86 | 
  87 | | **Dataset** | **Year** | **Venue** | **Sensors** | **Source** | **Tasks** | **download** |
  88 | |-------------|----------|-----------|-------------|------------|-----------|----------|
  89 | | Ko-PER | 2014 | ITSC | C, L | Real | 3DOD, MOT | [download](https://www.uni-ulm.de/in/mrm/forschung/datensaetze.html) |
  90 | | CityFlow | 2019 | CVPR | C | Real | MTSCT/MTMCT, ReID | [download](https://cityflow-project.github.io/) |
  91 | | BAAI-VANJEE  | 2021     | arXiv     | C, L        | Real       | 2D/3D OD   | [download](https://paperswithcode.com/dataset/baai-vanjee) |
  92 | | WIBAM        | 2021     | arXiv     | C           | Real       | 2D/3D OD   | [download](https://github.com/MatthewHowe/WIBAM) |
  93 | | INTERACTION | 2019 | IROS | C, L | Real | 2DOD, TP | [download](https://interaction-dataset.com/) |
  94 | | A9-Dataset | 2022 | IV | C, L | Real | 3DOD | [download](https://a9-dataset.com/) |
  95 | | IPS300+ | 2022 | ICRA | C, L | Real | 2DOD, 3DOD | [download](http://www.openmpd.com/column/IPS300) |
  96 | | Rope3D | 2022 | CVPR | C, L | Real | 2DOD, 3DOD | [download](https://thudair.baai.ac.cn/rope) |
  97 | | LUMPI | 2022 | IV | C, L | Real | 3DOD | [download](https://data.uni-hannover.de/cs_CZ/dataset/lumpi) |
  98 | | TUMTraf-I | 2023 | ITSC | C, L | Real | 3DOD | [download](https://innovation-mobility.com/en/project-providentia/a9-dataset/) |
  99 | | RoScenes | 2024 | ECCV | C | Real | 3DOD | [download](https://roscenes.github.io./) |
 100 | | H-V2X | 2024 | ECCV | C, R | Real | BEV Det, MOT, TP | [download](https://pan.quark.cn/s/86d19da10d18) |
 101 | 
 102 | 
 103 | Note: Sensors: Camera (C), LiDAR (L), Radar (R). Source: Real = collected in the real world; Sim = generated via simulation. Tasks: 2DOD = 2D Object Detection, 3DOD = 3D Object Detection, MOT = Multi-Object Tracking, MTSCT = Multi-target Single-camera Tracking, MTMCT = Multi-target Multi-camera Tracking, SS = Semantic Segmentation, TP = Trajectory Prediction, VPR = Visual Place Recognition, NR = Neural Reconstruction, Re-ID = Re-Identification, S2R = Sim2Real, MF = Motion Forecasting, PQA = Planning Q&A.
 104 | 
 105 | Note: {Real} denotes that the sensor data is obtained by real-world collection instead of simulation.
 106 | 
 107 | ### The Ko-PER Intersection Laserscanner and Video Dataset [[paper](https://ieeexplore.ieee.org/abstract/document/6957976)] [~~code~~] [[project](https://www.uni-ulm.de/in/mrm/forschung/datensaetze.html)]
 108 | 
 109 | - **Background and Motivation**  
 110 | Intersections are critical areas for traffic safety, as they are often the sites of accidents. Traditional methods of assessing and improving intersection safety are limited by real-world testing challenges. The Ko-PER project addresses this by equipping a public intersection with laserscanners and video cameras, enabling comprehensive data collection for traffic perception tasks. The primary motivation is to develop better models for road user detection, classification, and tracking, thus improving intersection safety.
 111 | 
 112 | - **Key Contributions**  
 113 |   - **Dataset for Multi-Object Detection and Tracking**: The dataset includes data from 14 laserscanners and 8 video cameras, designed to improve object detection and classification at intersections.
 114 |   - **Reference Data for Evaluation**: It provides highly accurate reference data for vehicle positions using RTK-GPS, offering a benchmark for evaluating perception algorithms.
 115 |   - **Rich Sensor Data**: The dataset features synchronized laserscanner measurements and camera images, facilitating research in multi-object tracking and classification.
 116 |   - **Real-World Intersection Setup**: Data was collected from a complex intersection, providing a naturalistic environment for testing and validating algorithms for intersection collision avoidance systems.
 117 |   - **Public Availability**: The dataset is publicly available for use by the research community, promoting further advancements in cooperative perception and road user detection technologies.
 118 |  
 119 | ### CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification [[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Tang_CityFlow_A_City-Scale_Benchmark_for_Multi-Target_Multi-Camera_Vehicle_Tracking_and_CVPR_2019_paper.pdf)] [[code](https://github.com/cityflow-project/CityFlow/)] [[project](https://cityflow-project.github.io/)]  [[doc](https://cityflow.readthedocs.io/en/latest/)]
 120 | 
 121 | - **Background and Motivation**  
 122 | The CityFlow dataset addresses the challenges of tracking vehicles across large urban areas using multiple traffic cameras. Traditional tracking methods face limitations due to the small spatial coverage and limited camera setups of existing benchmarks. CityFlow aims to solve this by providing a large-scale, multi-camera dataset designed for multi-target multi-camera (MTMC) tracking and vehicle re-identification (ReID).
 123 | 
 124 | - **Key Contributions**  
 125 |   - **City-Scale Dataset**: CityFlow is the first dataset at a city scale with 40 cameras spanning 10 intersections, providing synchronized HD videos over a 2.5 km area.
 126 |   - **Comprehensive Annotations**: Over 200K bounding boxes across various urban scenes, with camera calibration and GPS data for precise spatio-temporal analysis.
 127 |   - **Support for MTMC Tracking and ReID**: CityFlow supports both MTMC tracking and image-based vehicle ReID, providing a benchmark for these tasks.
 128 |   - **Real-World Challenges**: The dataset covers diverse environments and traffic conditions, incorporating issues such as motion blur and overlapping camera views.
 129 |   - **Evaluation Server**: An online platform for continuous performance comparison, providing a fair and transparent benchmarking process.
 130 | 
 131 | ### BAAI-VANJEE Roadside Dataset: Towards the Connected Automated Vehicle Highway Technologies in Challenging Environments of China [[paper](https://arxiv.org/abs/2105.14370)] [~~code~~] [[project](https://data.baai.ac.cn/data-set)] [~~project~~] 
 132 | 
 133 | - **Background and Motivation**  
 134 | This paper introduces the BAAI-VANJEE dataset, aimed at enhancing roadside perception for Connected Automated Vehicle Highway (CAVH) technologies. The dataset was created in response to the limitations of vehicle-based technologies that are difficult to scale. It provides high-quality LiDAR and RGB data collected from roadside sensors, addressing the need for datasets that can help improve detection tasks, including 2D/3D object detection and multi-sensor fusion in complex traffic environments.
 135 | 
 136 | - **Key Contributions**  
 137 |   - **Challenging Roadside Dataset**: The dataset includes 2500 frames of LiDAR data and 5000 frames of RGB images, with annotations for 12 object classes.
 138 |   - **High-Quality Annotations**: 74K 3D object annotations and 105K 2D object annotations, collected under varying weather conditions (sunny, cloudy, rainy) and times of day (day, night).
 139 |   - **Real-World Application**: Focuses on complex urban intersections and highway scenes, providing real-world data for CAVH research.
 140 |   - **Three Core Tasks**: Supports tasks including 2D object detection, 3D object detection, and multi-sensor fusion.
 141 |   - **Diverse Scenarios**: Includes data from diverse traffic conditions, providing a more comprehensive view of roadside perception.
 142 |   - **Public Availability**: The dataset is available online to support research in intelligent transportation and big data-driven innovation.
 143 | 
 144 | ### WIBAM  Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data [[paper](https://arxiv.org/abs/2110.10966)]  [~~code~~] [~~project~~] 
 145 | 
 146 | 
 147 | ### INTERACTION: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps [[paper](https://arxiv.org/abs/1910.03088)] [~~code~~] [~~project~~] 
 148 | 
 149 | - **Background and Motivation**  
 150 | The paper introduces the INTERACTION dataset, designed to support the development of autonomous driving systems in complex, interactive scenarios. Existing datasets have limitations in terms of diversity, criticality, and the inclusion of driving behaviors from different cultures. This dataset addresses these gaps by providing a variety of challenging, real-world driving scenarios, capturing interactions among vehicles with different behaviors.
 151 | 
 152 | - **Key Contributions**  
 153 |   - **Diverse Driving Scenarios**: The dataset includes roundabouts, intersections, ramp merging, lane changes, and other highly interactive driving scenarios from multiple countries.
 154 |   - **International Scope**: Data is collected from different continents (USA, China, Germany, Bulgaria), providing insights into various driving cultures.
 155 |   - **Complex and Critical Situations**: It includes aggressive, irrational behaviors, near-collisions, and critical driving situations, which are crucial for developing robust autonomous driving systems.
 156 |   - **Semantic Maps**: The dataset comes with detailed HD maps, which include traffic rules, lanelets, and road features, essential for motion prediction and planning tasks.
 157 |   - **Complete Interaction Data**: The dataset captures the interactions of all entities influencing vehicle behavior, enabling better modeling and prediction.
 158 | 
 159 | ### A9-Dataset: Multi-Sensor Infrastructure-Based Dataset for Mobility Research [[paper](https://ieeexplore.ieee.org/abstract/document/9827401)] [~~code~~] [[code](https://github.com/providentia-project/a9-dev-kit)] [[project]([https://a9-dataset.com](https://innovation-mobility.com/en/project-providentia/a9-dataset/))]
 160 | 
 161 | - **Background and Motivation**  
 162 | The paper introduces the A9-Dataset, collected from the Providentia++ test field in Germany, which aims to provide high-quality, real-world data for autonomous driving and mobility research. The dataset addresses the challenge of lacking diverse road scenarios captured by stationary multi-modal sensors, especially in infrastructure-based perception systems.
 163 | 
 164 | - **Key Contributions**  
 165 |   - **Multi-Sensor Setup**: The dataset includes data from cameras and LiDAR sensors mounted on overhead gantry bridges along the A9 autobahn.
 166 |   - **Diverse Traffic Scenarios**: It captures traffic on a variety of road segments including highways, rural roads, and urban intersections.
 167 |   - **High-Quality Labeled Data**: The dataset provides 3D bounding boxes for over 14,000 objects, with high-resolution camera and LiDAR frames.
 168 |   - **Real-World Traffic**: The A9-Dataset features dense traffic data recorded under real-world conditions, making it useful for training and testing perception models for autonomous vehicles.
 169 |   - **Open Access**: The dataset is publicly available, promoting further research in autonomous driving and infrastructure-based perception systems.
 170 | 
 171 | ### IPS300+: A Challenging Multi-Modal Data Sets for Intersection Perception System [[paper](https://ieeexplore.ieee.org/abstract/document/9811699)]  [~~code~~]  [[project](http://www.openmpd.com/column/IPS300)]
 172 | 
 173 | - **Background and Motivation**  
 174 | The paper addresses the complexity and occlusion issues at urban intersections, which pose a significant challenge for autonomous driving systems. While on-board perception is limited in crowded, obstructed urban environments, the introduction of Cooperative Vehicle Infrastructure Systems (CVIS) offers a solution. However, there was a lack of open-source, multi-modal datasets for intersection perception, motivating the creation of the IPS300+ dataset.
 175 | 
 176 | - **Key Contributions**  
 177 |   - **First Open-Source Multi-Modal Dataset**: IPS300+ is the first multi-modal dataset for roadside perception in large-scale urban intersection scenes, providing data on both point clouds and images.
 178 |   - **High Label Density**: The dataset includes 14,198 frames, each with an average of 319.84 labels, significantly larger than existing datasets such as KITTI.
 179 |   - **Dense 3D Bounding Box Annotations**: Labels are provided at 5Hz, offering rich ground truth data for 3D object detection and tracking tasks.
 180 |   - **Feasible Solution for IPS Construction**: The dataset also presents an affordable approach to building Intersection Perception Systems (IPS), including a wireless solution for time synchronization and spatial calibration.
 181 |   - **Challenges in CVIS Algorithms**: The dataset introduces unique challenges for algorithms, particularly in multi-modal fusion and spatial coordination across different roadside units (RSUs), contributing valuable research avenues for CVIS and intersection perception tasks.
 182 | 
 183 | ### Rope3D: The Roadside Perception Dataset for Autonomous Driving [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Ye_Rope3D_The_Roadside_Perception_Dataset_for_Autonomous_Driving_and_Monocular_CVPR_2022_paper.pdf)] [[code]()] [[project](https://thudair.baai.ac.cn/rope)] [~~project~~]
 184 | 
 185 | - **Background and Motivation**  
 186 | Current autonomous driving perception systems mainly focus on frontal-view data from vehicle-mounted sensors. However, this perspective creates limitations, such as blind spots and occlusions. Roadside perception systems, which provide a more comprehensive view of traffic scenarios, could enhance safety and prediction accuracy. The motivation behind this work is to address these challenges by introducing a roadside perception dataset, Rope3D, designed to improve 3D localization and object detection tasks for autonomous driving.
 187 | 
 188 | - **Key Contributions**  
 189 |   - **Introduction of Rope3D**: This paper presents Rope3D, the first large-scale, high-diversity roadside perception dataset for autonomous driving, containing over 50k images and 1.5M 3D objects.  
 190 |   - **Challenging Environment**: The dataset is collected under varied weather conditions, times of day, and camera specifications, introducing complexities such as ambiguous camera positions and viewpoints.  
 191 |   - **Joint Annotation**: The dataset features joint 2D-3D annotations, improving the ability to perform monocular 3D detection in roadside scenarios.  
 192 |   - **New Evaluation Metrics**: Rope3D establishes a new benchmark for 3D roadside perception, proposing unique evaluation metrics and a devkit to measure task performance.  
 193 |   - **Adapting Existing Detection Models**: The paper customizes monocular 3D detection methods for roadside perception, overcoming challenges like varied camera viewpoints and increasing object density.
 194 | 
 195 | ### LUMPI: The Leibniz University Multi-Perspective Intersection Dataset [[paper](https://ieeexplore.ieee.org/abstract/document/9827157)]  [~~code~~]  [[project](https://data.uni-hannover.de/cs_CZ/dataset/lumpi)] 
 196 | 
 197 | - **Background and Motivation**  
 198 | The paper introduces the LUMPI dataset, designed to address the limitations of single-view datasets used for autonomous driving. Traditional datasets often suffer from occlusions, making precise pose estimation difficult. LUMPI provides a multi-view dataset to improve accuracy in object detection and tracking by using multiple cameras and LiDAR sensors.
 199 | 
 200 | - **Key Contributions**  
 201 |   - **Multi-View Dataset**: LUMPI introduces a multi-perspective dataset combining 2D video and 3D LiDAR point clouds from multiple cameras and sensors, enhancing pose estimation and tracking accuracy.
 202 |   - **Varied Weather Conditions**: The dataset was recorded under different weather conditions, providing diverse data for more robust perception systems.
 203 |   - **Collaborative Data Processing**: It supports the development of collaborative algorithms by providing multi-sensor data that can be used to validate and compare single sensor results.
 204 |   - **Traffic Participant Labels**: Precise labels for road users are included, along with a high-density reference point cloud, aiding in accurate trajectory generation and collaboration in data processing.
 205 |   - **Use Cases**: The dataset is valuable for research in traffic forecasting, anomaly detection, intent prediction, and junction mapping.
 206 | 
 207 | 
 208 | 
 209 | ### TUMTraf Intersection Dataset: All You Need for Urban 3D Camera-LiDAR Roadside Perception [[paper](https://ieeexplore.ieee.org/abstract/document/10422289)]  [~~code~~]  [[project](https://innovation-mobility.com/en/project-providentia/a9-dataset/)]
 210 | 
 211 | - **Background and Motivation**  
 212 | The TUMTraf Intersection Dataset addresses the need for high-quality 3D object detection in roadside infrastructure systems. Traditional vehicle-mounted sensors fail to cover complex intersection scenarios. This dataset was developed to enhance autonomous vehicle systems by providing labeled LiDAR point clouds and synchronized camera images from elevated roadside sensors, supporting 3D object detection tasks.
 213 | 
 214 | - **Key Contributions**  
 215 |   - **Comprehensive Dataset**: 4.8k labeled LiDAR point clouds and 4.8k synchronized camera images with 57.4k high-quality 3D bounding boxes.
 216 |   - **Diverse Traffic Scenarios**: The dataset includes complex maneuvers like left and right turns, overtaking, and U-turns, across varied weather and lighting conditions.
 217 |   - **Calibration Data**: Provides extrinsic calibration data for accurate sensor fusion between cameras and LiDARs.
 218 |   - **High-Class Diversity**: Includes ten object classes with a broad range of road users, including vulnerable pedestrians.
 219 |   - **Evaluation Baselines**: Offers multiple baselines for 3D object detection with monocular, LiDAR, and multi-modal setups, demonstrating robust performance in urban traffic perception tasks.
 220 | 
 221 | ### RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception [[paper](https://link.springer.com/chapter/10.1007/978-3-031-72940-9_19)] [[paper](https://arxiv.org/abs/2405.09883)] [[code](https://github.com/roscenes/RoScenes)] [[project](https://roscenes.github.io./})]
 222 | 
 223 | - **Background and Motivation**  
 224 | The paper introduces RoScenes, the largest multi-view 3D dataset designed for roadside perception tasks. Traditional roadside perception datasets face limitations due to small-scale sensing areas and insufficient camera setups. RoScenes aims to address these challenges by offering a large-scale dataset with full scene coverage and crowded traffic, designed for 3D object detection and other advanced roadside perception tasks.
 225 | 
 226 | - **Key Contributions**  
 227 |   - **Large-Scale Multi-View Dataset**: RoScenes provides 21.13 million 3D annotations across 64,000 m² of highway scenes, making it the largest roadside perception dataset.
 228 |   - **Innovative Annotation Pipeline**: The paper introduces a novel BEV-to-3D joint annotation pipeline that efficiently produces accurate 3D annotations without the need for expensive LiDAR sensors.
 229 |   - **Roadside Configuration**: The dataset features high-coverage camera setups, with 6-12 cameras mounted on 4-6 poles, which effectively eliminate occlusions and provide a broad perception range.
 230 |   - **RoBEV Model**: The paper presents RoBEV, a BEV detection method that uses feature-guided 3D position embedding to improve performance for 3D detection, outperforming state-of-the-art methods.
 231 |   - **Benchmark for BEV Architectures**: RoScenes serves as a benchmark for evaluating BEV architectures, offering a comprehensive study of various detection methods under real-world roadside conditions.
 232 | 
 233 | ### H-V2X: A Large Scale Highway Dataset for BEV Perception  [[paper](https://eccv2024.ecva.net/virtual/2024/poster/126)] [~~code~~] [~~project~~]
 234 | 
 235 | - **Background and Motivation**  
 236 | The paper introduces H-V2X, a large-scale dataset for highway roadside perception, addressing the gap in current datasets primarily focused on urban environments. Existing datasets often lack coverage of highway scenarios, particularly those related to vehicle-to-everything (V2X) technology. H-V2X was created to advance research on highway perception by providing real-world, high-quality data from multi-sensor setups in highway settings.
 237 | 
 238 | - **Key Contributions**  
 239 |   - **First Large-Scale Highway Dataset**: H-V2X is the first large-scale dataset for highway roadside perception, incorporating data from real-world sensors on highways, covering over 100 km.
 240 |   - **Multi-Sensor Integration**: The dataset includes synchronized data from cameras and radars, with vector map information, ensuring comprehensive BEV-based perception.
 241 |   - **Three Key Tasks**: Introduces three critical tasks for highway perception: BEV detection, MOT (Multi-Object Tracking), and trajectory prediction, supported by ground truth data and benchmarks.
 242 |   - **New Benchmark Methods**: The paper presents innovative methods incorporating HDMap data for improved BEV detection and trajectory prediction in highway scenarios.
 243 | 
 244 | 
 245 | 
 246 |  
 247 | 
 248 | ## :bookmark:V2V Datasets <a id="v2v-datasets-anchor"></a>
 249 | 
 250 | - **V2V Datasets**: Vehicle-to-vehicle datasets capture collaboration between vehicles, facilitating research on cooperative perception under occlusion, sparse observations, or dynamic driving scenarios.
 251 | 
 252 | ### Table
 253 | 
 254 | | **Dataset** | **Year** | **Venue** | **Sensors** | **Source** | **Tasks** | **download** |
 255 | |-------------|----------|-----------|-------------|------------|-----------|----------|
 256 | | T & J        | 2019     | ICDCS     | C, L, R     | Real       | 3D OD      | -            |
 257 | | V2V-Sim      | 2020     | ECCV      | L           | Sim        | 3D OD      | -            |
 258 | | COMAP | 2021 | ISPRS | L, C | Sim | 3DOD, SS | [download](https://demuc.de/colmap/) |
 259 | | CODD | 2021 | RA-L | L | Sim | Registration | [download](https://github.com/eduardohenriquearnold/fastreg) |
 260 | | OPV2V | 2022 | ICRA | C, L, R | Sim | 3DOD | [download](https://mobility-lab.seas.ucla.edu/opv2v/) |
 261 | | OPV2V+ | 2023 | CVPR | C, L, R | Sim | 3DOD | [download](https://siheng-chen.github.io/dataset/CoPerception+/) |
 262 | | IRV2V        | 2023     | NIPS      | L, C        | Sim        | 3D OD      | [download](https://paperswithcode.com/dataset/irv2v) |
 263 | | V2V4Real | 2023 | CVPR | L, C | Real | 3DOD, MOT, S2R | [download](https://mobility-lab.seas.ucla.edu/v2v4real/) |
 264 | | LUCOOP | 2023 | IV | L | Real | 3DOD | [download](https://data.uni-hannover.de/vault/icsens/axmann/lucoop-leibniz-university-cooperative-perception-and-urban-navigation-dataset/) |
 265 | | MARS | 2024 | CVPR | L, C | Real | VPR, NR | [download](https://ai4ce.github.io/MARS/) |
 266 | | OPV2V-H | 2024 | ICLR | C, L, R | Sim | 3DOD | [download](https://github.com/yifanlu0227/HEAL) |
 267 | | V2V-QA | 2025 | arXiv | L, C | Real | 3DOD, PQA | [download](https://eddyhkchiu.github.io/v2vllm.github.io/) |
 268 | | CP-UAV | 2022 | NIPS | L, C | Sim | 3DOD | [download](https://siheng-chen.github.io/dataset/coperception-uav/) |
 269 | 
 270 | Note: Sensors: Camera (C), LiDAR (L), Radar (R). Source: Real = collected in the real world; Sim = generated via simulation. Tasks: 2DOD = 2D Object Detection, 3DOD = 3D Object Detection, MOT = Multi-Object Tracking, MTSCT = Multi-target Single-camera Tracking, MTMCT = Multi-target Multi-camera Tracking, SS = Semantic Segmentation, TP = Trajectory Prediction, VPR = Visual Place Recognition, NR = Neural Reconstruction, Re-ID = Re-Identification, S2R = Sim2Real, MF = Motion Forecasting, PQA = Planning Q&A.
 271 | 
 272 | 
 273 | ### **T & J**  Cooper: Cooperative Perception for Connected Autonomous Vehicles Based on 3D Point Clouds
 274 | 
 275 | - **Background and Motivation**  
 276 | The paper introduces the Cooper system, aimed at enhancing the detection accuracy of autonomous vehicles through cooperative perception. Autonomous vehicles often suffer from sensor limitations, leading to potential detection failures. By enabling connected vehicles to share sensor data, particularly 3D LiDAR point clouds, the system aims to extend sensing areas, improve detection accuracy, and enhance safety in dynamic driving environments.
 277 | 
 278 | - **Key Contributions**  
 279 |   - **Sparse Point-Cloud Object Detection (SPOD)**: Introduces the SPOD method for detecting objects in sparse LiDAR point clouds, which improves detection even in low-density data.
 280 |   - **Cooperative Perception System**: Demonstrates how multiple connected autonomous vehicles can share LiDAR data, merging point clouds from different vehicles to enhance object detection.
 281 |   - **Improved Detection Performance**: Shows that cooperative perception expands the sensing area, improves detection accuracy, and complements traditional object detection methods.
 282 |   - **Data Transmission Feasibility**: Demonstrates the feasibility of transmitting LiDAR point clouds for cooperative perception using existing vehicular network technologies, maintaining efficiency even with limited bandwidth.
 283 |   - **Evaluation on Real-World Datasets**: Evaluates the Cooper system on the KITTI and T&J datasets, showing significant improvements in detection performance, especially for objects that were previously undetected by individual vehicles.
 284 | 
 285 | ### **V2V-Sim**  V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction
 286 | 
 287 | - **Background and Motivation**  
 288 | This paper explores the use of Vehicle-to-Vehicle (V2V) communication to enhance the perception and motion forecasting of self-driving vehicles (SDVs). The key motivation is the challenge that SDVs face in detecting and forecasting the behavior of objects that are occluded or far away, which can be critical in safety-sensitive situations. By leveraging information shared from nearby vehicles, SDVs can overcome these limitations and improve overall safety and efficiency.
 289 | 
 290 | - **Key Contributions**  
 291 |   - **V2V Communication for Perception and Prediction**: Introduces a novel V2V approach, V2VNet, which integrates shared information from multiple vehicles to improve detection and motion forecasting accuracy.
 292 |   - **Compression of Intermediate Representations**: The model transmits compressed intermediate feature maps from the perception and prediction (P&P) neural network, balancing accuracy and bandwidth efficiency.
 293 |   - **Graph Neural Network (GNN)**: Utilizes a spatially aware GNN to aggregate information received from other SDVs, allowing intelligent fusion of data from different time points and viewpoints.
 294 |   - **V2V-Sim Dataset**: Proposes the creation of a new dataset, V2V-Sim, that simulates the real-world conditions where multiple SDVs share information, demonstrating the effectiveness of the V2VNet approach.
 295 | 
 296 | ### COMAP: A Synthetic Dataset for Collective Multi-Agent Perception of Autonomous Driving  [[paper]()] [[code]()] [[project]()]
 297 | 
 298 | - **Background and Motivation**  
 299 | The paper addresses the challenges of single-agent perception in autonomous driving, particularly issues like occlusion and sensor noise. Traditional methods struggle with these challenges due to limited field-of-view (FOV) and imperfect data collection. Collective multi-agent perception (COMAP) enhances these systems by enabling data sharing across vehicles, improving the detection accuracy and robustness of autonomous driving.
 300 | 
 301 | - **Key Contributions**  
 302 |   - **Data Generator**: The paper introduces an efficient data generator for COMAP, capable of producing both image and point cloud data with ground truth for object detection and semantic segmentation.
 303 |   - **Scalable Simulation**: The generator allows for the creation of dense traffic scenarios without excessive computational resources, enabling scalability.
 304 |   - **Enhanced Perception**: Through experiments, COMAP's performance is shown to surpass single-agent perception, improving object detection accuracy, bounding box localization, and sensor noise robustness.
 305 |   - **Data Fusion Techniques**: The paper discusses different stages of data fusion (raw data fusion, deep feature fusion, and fully processed data fusion) and compares their effectiveness in improving perception performance.
 306 | 
 307 | ### Fast and Robust Registration of Partially Overlapping Point Clouds  [[paper]()] [[code]()] [[project]()]
 308 | 
 309 | - **Background and Motivation**  
 310 | The paper addresses the challenges faced in registering point clouds that partially overlap. Point cloud registration plays a vital role in several fields, including autonomous driving and 3D mapping, where accurate alignment of 3D sensor data from different viewpoints is critical. Existing methods often struggle with incomplete or noisy data, leading to suboptimal registration results.
 311 | 
 312 | - **Key Contributions**  
 313 |   - **Robust Registration Approach**: The paper proposes a novel technique for fast and robust registration of partially overlapping point clouds, improving accuracy in real-world applications.
 314 |   - **Handling Partial Overlap**: The method focuses on handling scenarios with partial overlap, which is common in practical applications like autonomous vehicle navigation and robotic mapping.
 315 |   - **Speed and Efficiency**: The proposed approach is designed to be computationally efficient, making it suitable for real-time applications in dynamic environments.
 316 |   - **Experimental Validation**: Extensive experiments demonstrate that the method outperforms traditional registration techniques in both accuracy and robustness, even in the presence of noise and partial overlap.
 317 | 
 318 | ### OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication   [[paper](https://arxiv.org/abs/2109.07644)] [[code](https://github.com/DerrickXuNu/OpenCOOD)] [[project](https://mobility-lab.seas.ucla.edu/opv2v)]
 319 | 
 320 | - **Background and Motivation**  
 321 | Vehicle-to-Vehicle (V2V) communication offers the potential to improve perception performance in autonomous driving, but there is a lack of large-scale open datasets for V2V perception. This gap hampers the development and benchmarking of V2V algorithms, motivating the creation of the OPV2V dataset, the first large-scale open dataset for V2V perception.
 322 | 
 323 | - **Key Contributions**  
 324 |   - **First Open Dataset for V2V Perception**: OPV2V is the first large-scale open dataset specifically designed for Vehicle-to-Vehicle perception tasks.
 325 |   - **Benchmark with Multiple Fusion Strategies**: The paper introduces a benchmark that evaluates various fusion strategies (early, late, and intermediate fusion) combined with state-of-the-art 3D LiDAR detection algorithms.
 326 |   - **Proposed Attentive Intermediate Fusion Pipeline**: A new pipeline is proposed for aggregating information from multiple connected vehicles, which performs well even under high compression rates, improving the effectiveness of V2V communication.
 327 |   - **Open-source Availability**: The dataset, benchmark models, and code are publicly available, encouraging further research in V2V perception.
 328 | 
 329 | ### **OPV2V+**  Collaboration Helps Camera Overtake LiDAR in 3D Detection   [[paper](https://arxiv.org/abs/2303.13560)] [[code](https://github.com/MediaBrain-SJTU/CoCa3D)] [[project](https://siheng-chen.github.io/dataset/CoPerception+)]
 330 | ### **CoPerception-UAV+**  Collaboration Helps Camera Overtake LiDAR in 3D Detection   [[paper](https://arxiv.org/abs/2303.13560)] [[code](https://github.com/MediaBrain-SJTU/CoCa3D)] [[project](https://siheng-chen.github.io/dataset/CoPerception+)]
 331 | 
 332 | - **Background and Motivation**  
 333 | The paper addresses the challenge of improving camera-only 3D detection, which typically struggles with depth estimation compared to LiDAR. Traditional camera-based 3D detection has significant limitations, particularly in depth estimation, which is critical for accurate 3D object localization in autonomous driving. The authors propose a solution based on multi-agent collaboration to overcome these challenges.
 334 | 
 335 | - **Key Contributions**  
 336 |   - **CoCa3D Framework**: A novel collaborative camera-only 3D detection framework that utilizes multi-agent collaboration to improve depth estimation and 3D detection accuracy.
 337 |   - **Collaborative Depth Estimation**: The framework allows agents to share depth information, helping to resolve depth ambiguities and occlusions, improving overall detection accuracy.
 338 |   - **Improved Detection Performance**: With multiple agents working together, the framework significantly enhances detection performance, making camera-based systems competitive with LiDAR-based systems in certain scenarios.
 339 |   - **Efficient Communication**: The framework optimizes communication between agents by selecting and transmitting only the most informative cues, improving efficiency.
 340 |   - **Dataset Expansion**: The authors expanded existing datasets (OPV2V+, DAIR-V2X, and CoPerception-UAVs+) to include more collaborative agents and demonstrated that the collaborative camera system outperforms LiDAR in some cases, achieving state-of-the-art performance on multiple benchmarks.
 341 | 
 342 | ###  **IRV2V**  Asynchrony-Robust Collaborative Perception via Bird’s Eye View Flow  [[paper&review](https://openreview.net/forum?id=UHIDdtxmVS)] [~~code~~] [~~project~~]
 343 | 
 344 | - **Background and Motivation**  
 345 | The paper addresses the issue of temporal asynchrony in collaborative perception systems. Asynchronous communication among vehicles due to network delays, interruptions, or misalignments can cause significant issues in multi-agent collaboration. The paper proposes CoBEVFlow, a system designed to handle these asynchronies by using a bird’s-eye view (BEV) flow map to align asynchronous messages, ensuring more reliable collaborative perception in real-world autonomous driving scenarios.
 346 | 
 347 | - **Key Contributions**  
 348 |   - **CoBEVFlow Framework**: Introduces CoBEVFlow, an asynchrony-robust collaborative perception system that compensates for temporal misalignments in data exchanges.
 349 |   - **BEV Flow**: Proposes BEV flow to model and compensate for motion in the scene, allowing asynchronous features to be realigned accurately without generating new features, avoiding extra noise.
 350 |   - **IRV2V Dataset**: Creates the IRV2V dataset, the first synthetic dataset with various temporal asynchronies, simulating real-world scenarios to test the effectiveness of the proposed approach.
 351 |   - **Performance Validation**: Demonstrates that CoBEVFlow consistently outperforms existing methods under different latency conditions, improving detection performance by up to 30.3% compared to other state-of-the-art methods.
 352 |   - **Low Communication Cost**: CoBEVFlow is communication-efficient, transmitting only sparse features and ROI sets, reducing the overall communication bandwidth required for collaborative perception.
 353 | 
 354 | ### V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception [[paper](https://arxiv.org/abs/2303.07601)] [[code](https://github.com/ucla-mobility/V2V4Real)] [[project](https://mobility-lab.seas.ucla.edu/v2v4real)]
 355 | 
 356 | - **Background and Motivation**  
 357 | The paper introduces V2V4Real, a real-world large-scale dataset designed to address the limitations of single-vehicle perception in autonomous driving. The lack of real-world datasets for Vehicle-to-Vehicle (V2V) cooperative perception has hindered progress in this area. This dataset aims to enhance the capabilities of V2V cooperative perception by providing multimodal data in real-world scenarios.
 358 | 
 359 | - **Key Contributions**  
 360 |   - **Real-World Large-Scale Dataset**: V2V4Real is the first large-scale real-world V2V dataset, collected over 410 km of road coverage.
 361 |   - **Multimodal Sensor Data**: The dataset includes 20K LiDAR frames and 40K RGB images with over 240K annotated 3D bounding boxes for five vehicle classes.
 362 |   - **Diverse Driving Scenarios**: It captures a variety of road types and driving scenarios, including intersections, highways, and city roads.
 363 |   - **Cooperative Perception Tasks**: It introduces three tasks for cooperative perception: 3D object detection, object tracking, and Sim2Real domain adaptation.
 364 |   - **Publicly Available**: The dataset and benchmarks will be made publicly available, encouraging further research and development in cooperative perception for autonomous driving.
 365 | 
 366 | ### LUCOOP: Leibniz University Cooperative Perception and Urban Navigation Dataset  [[paper]()] [[code]()] [[project]()]
 367 | 
 368 | - **Background and Motivation**  
 369 | Recent autonomous driving datasets mainly involve data collected from a single vehicle. However, to enhance cooperative driving applications like object detection and urban navigation, multi-vehicle datasets are needed. The LUCOOP dataset aims to fill this gap by providing real-world, time-synchronized multi-modal data from three interacting vehicles in an urban environment, fostering research in cooperative applications.
 370 | 
 371 | - **Key Contributions**  
 372 |   - **Multi-Vehicle Setup**: The LUCOOP dataset includes data from three interacting vehicles, each equipped with LiDAR, GNSS, and IMU sensors.
 373 |   - **V2V and V2X Range Measurements**: The dataset offers UWB range measurements, both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2X).
 374 |   - **Ground Truth Trajectories**: Ground truth poses for each vehicle are provided, derived from GNSS/IMU integration, total station observations, and point cloud registration.
 375 |   - **3D Map Point Cloud**: A dense 3D map point cloud of the measurement area and an LOD2 city model are included for high-precision localization.
 376 |   - **Object Detection Annotations**: The dataset provides 3D bounding box annotations for static and dynamic vehicles, pedestrians, and other traffic participants.
 377 |   - **Large-Scale Data**: It includes over 54,000 LiDAR frames, 700,000 IMU measurements, and more than 2.5 hours of GNSS data.
 378 | 
 379 | ### **Open Mars Dataset**  Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset   [[code](https://github.com/ai4ce/MARS)] [[paper](https://arxiv.org/abs/2406.09383)] [[project](https://ai4ce.github.io/MARS)]
 380 | 
 381 | - **Background and Motivation**  
 382 | The paper introduces the MARS dataset, aimed at addressing the gap in existing datasets by incorporating multiagent and multitraversal elements. Traditional autonomous driving datasets often lack collaborative and repeated traversals, which limit advancements in perception, prediction, and planning. MARS was developed to fill this gap, enabling richer research in multiagent systems and enhanced 3D scene understanding.
 383 | 
 384 | - **Key Contributions**  
 385 |   - **Multiagent Collection**: MARS includes data from multiple autonomous vehicles operating in the same geographical region, enabling collaborative 3D perception.
 386 |   - **Multitraversal Data**: It captures multiple traversals of the same locations under various conditions, enhancing 3D scene understanding over time.
 387 |   - **Multimodal Sensor Setup**: The dataset features a full 360-degree sensor suite, including LiDAR and RGB cameras, for comprehensive scene analysis.
 388 |   - **Research Opportunities**: MARS opens new avenues for research in multiagent collaborative perception, unsupervised learning, and multitraversal 3D reconstruction.
 389 |   - **Real-World Data**: The dataset was collected using May Mobility's autonomous vehicles, ensuring high scalability and diversity across locations.
 390 | 
 391 | ###  **OPV2V-H** An Extensible Framework for Open Heterogeneous Collaborative Perception  [[paper&review](https://openreview.net/forum?id=KkrDUGIASk)] [[code](https://github.com/yifanlu0227/HEAL)] [[project](https://huggingface.co/datasets/yifanlu/OPV2V-H)]
 392 | 
 393 | - **Background and Motivation**  
 394 | Collaborative perception enhances the capabilities of individual agents by enabling data sharing to overcome perception limitations such as occlusion. However, existing models typically assume agents are homogeneous, while real-world scenarios often involve heterogeneous agents with different sensor modalities and models. This paper addresses the challenge of integrating new heterogeneous agents with minimal cost and high performance.
 395 | 
 396 | - **Key Contributions**  
 397 |   - **HEAL Framework**: The paper introduces HEAL, an extensible framework for open heterogeneous collaborative perception, designed to integrate new agent types seamlessly into existing systems.
 398 |   - **Unified Feature Space**: HEAL uses a novel multi-scale, foreground-aware Pyramid Fusion network to establish a unified feature space for all agents.
 399 |   - **Backward Alignment Mechanism**: New agents are integrated via a backward alignment process that minimizes training costs by aligning their features to the unified space without requiring retraining of the entire model.
 400 |   - **OPV2V-H Dataset**: The paper presents the OPV2V-H dataset, which includes more diverse sensor types to support heterogeneous collaborative perception research.
 401 |   - **Performance**: HEAL outperforms existing methods in collaborative detection, reducing training parameters by 91.5% when integrating new agent types, demonstrating its efficiency and scalability.
 402 | 
 403 | ### V2V-QA - V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models [[paper](https://arxiv.org/abs/2502.09980)] [[code](https://github.com/eddyhkchiu/V2VLLM)] [[project](https://eddyhkchiu.github.io/v2vllm.github.io)]
 404 | 
 405 | - **Background and Motivation**  
 406 | The paper addresses the limitations of autonomous vehicles relying solely on individual sensors for perception, especially when sensors are malfunctioning or occluded. Vehicle-to-vehicle (V2V) communication for cooperative perception is introduced to mitigate these issues, focusing on using Large Language Models (LLMs) for collaborative planning in autonomous driving.
 407 | 
 408 | - **Key Contributions**  
 409 |   - **V2V-QA Dataset**: A new dataset that includes grounding, notable object identification, and planning tasks designed for cooperative autonomous driving scenarios.
 410 |   - **V2V-LLM Model**: The proposed Vehicle-to-Vehicle Large Language Model integrates the perception data from multiple connected autonomous vehicles (CAVs) and answers driving-related questions.
 411 |   - **Improved Performance**: The V2V-LLM outperforms other baseline fusion methods in notable object identification and planning tasks, demonstrating its potential as a unified model for cooperative autonomous driving.
 412 |   - **New Research Direction**: Establishes a novel approach to cooperative autonomous driving using LLMs, improving safety and collaborative performance across multiple vehicles.
 413 | 
 414 | ### CoPerception-UAV - Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps  [[paper&review](https://openreview.net/forum?id=dLL4KXzKUpS)] [[code](https://github.com/MediaBrain-SJTU/where2comm)] [[project](https://siheng-chen.github.io/dataset/coperception-uav)]
 415 | 
 416 | - **Background and Motivation**  
 417 | Collaborative perception allows multiple agents to share complementary information, enhancing overall perception. However, there is a challenge in balancing the perception performance with the communication bandwidth. Previous methods have not addressed the issue of efficiently selecting which spatial areas to communicate, leading to excessive bandwidth usage. This paper proposes a solution by focusing on sharing only perceptually critical information, optimizing communication efficiency.
 418 | 
 419 | - **Key Contributions**  
 420 |   - **Spatial Confidence Map**: Introduces a spatial confidence map that highlights the perceptually critical areas, helping agents focus on the most important information.
 421 |   - **Efficient Communication**: Develops a communication-efficient framework, Where2comm, which uses spatial confidence maps to reduce bandwidth consumption by transmitting sparse but critical data.
 422 |   - **Dynamic Adaptation**: Where2comm adapts to varying communication bandwidths, dynamically adjusting which spatial areas are communicated based on their perceptual importance.
 423 |   - **Superior Performance**: Evaluates Where2comm on several datasets, showing it consistently outperforms existing methods, reducing communication volume by more than 100,000 times while improving perception performance.
 424 |   - **Real-World and Simulation Scenarios**: Demonstrates the framework’s robustness in both real-world and simulation environments, with multi-agent setups including cars and drones equipped with cameras and LiDAR sensors.
 425 | 
 426 | ## :bookmark:V2I Datasets <a id="v2i-datasets-anchor"></a>
 427 | 
 428 | - **V2I Datasets**: These datasets involve communication between vehicles and infrastructure, supporting cooperative tasks like object detection, tracking, and decision-making in connected environments.
 429 | 
 430 | ### Table
 431 | 
 432 | | **Dataset** | **Year** | **Venue** | **Sensors** | **Source** | **Tasks** | **download** |
 433 | |-------------|----------|-----------|-------------|------------|-----------|----------|
 434 | | CoopInf | 2020 | TITS | L, C | Sim | 3DOD | [download](https://github.com/eduardohenriquearnold/coop-3dod-infra?tab=readme-ov-file) |
 435 | | CARTI        | 2022     | ITSC      | L           | Sim        | 3D OD      | -            |
 436 | | DAIR-V2X-C | 2022 | CVPR | L, C | Real | 3DOD | [download](https://air.tsinghua.edu.cn/DAIR-V2X/index.html) |
 437 | | V2X-Seq | 2023 | CVPR | L, C | Real | 3DOT, TP | [download](https://github.com/AIR-THU/DAIR-V2X-Seq) |
 438 | | HoloVIC | 2024 | CVPR | L, C | Real | 3DOD, MOT | [download](https://holovic.net) |
 439 | | OTVIC | 2024 | IROS | L, C | Real | 3DOD | [download](https://sites.google.com/view/otvic) |
 440 | | DAIR-V2XReid | 2024 | TITS | L, C | Real | 3DOD, Re-ID | [download](https://github.com/Niuyaqing/DAIR-V2XReid) |
 441 | | TUMTraf V2X | 2024 | CVPR | L, C | Real | 3DOD, MOT | [download](https://tum-traffic-dataset.github.io/tumtraf-v2x/) |
 442 | | V2X-DSI      | 2024     | IV        | L, C        | Sim        | 3D OD      | -            |
 443 | | V2X-Radar | 2024 | arxiv | L, C, R | Real | 3DOD | [download](http://openmpd.com/column/V2X-Radar) |
 444 | 
 445 | Note: Sensors: Camera (C), LiDAR (L), Radar (R). Source: Real = collected in the real world; Sim = generated via simulation. Tasks: 2DOD = 2D Object Detection, 3DOD = 3D Object Detection, MOT = Multi-Object Tracking, MTSCT = Multi-target Single-camera Tracking, MTMCT = Multi-target Multi-camera Tracking, SS = Semantic Segmentation, TP = Trajectory Prediction, VPR = Visual Place Recognition, NR = Neural Reconstruction, Re-ID = Re-Identification, S2R = Sim2Real, MF = Motion Forecasting, PQA = Planning Q&A.
 446 | 
 447 | ### **CoopInf**  Cooperative Perception for 3D Object Detection in Driving Scenarios Using Infrastructure Sensors  [[paper]()] [[code]()] [[project]()]
 448 | 
 449 | - **Background and Motivation**  
 450 | The paper addresses the challenge of 3D object detection in autonomous driving scenarios, particularly in complex environments like T-junctions and roundabouts. Traditional single-sensor systems face limitations such as occlusion and restricted field-of-view, hindering reliable detection. Cooperative perception using multiple spatially diverse infrastructure sensors provides an effective solution to mitigate these issues and improve detection performance.
 451 | 
 452 | - **Key Contributions**  
 453 |   - **Cooperative 3D Object Detection Schemes**: Two fusion schemes—early and late fusion—are proposed for cooperative perception. Early fusion combines raw sensor data before detection, while late fusion combines detected objects post-detection.
 454 |   - **Evaluation of Fusion Approaches**: The paper compares both fusion schemes and their hybrid combination in terms of detection performance and communication costs. Early fusion outperforms late fusion but requires higher bandwidth.
 455 |   - **Novel Cooperative Dataset**: A synthetic dataset for cooperative 3D object detection in driving scenarios, including a T-junction and roundabout, is introduced for performance evaluation.
 456 |   - **Impact of Sensor Configurations**: The study also evaluates how sensor number and positioning impact detection performance, offering practical insights for real-world deployment .
 457 | 
 458 | ### **CARTI** PillarGrid: Deep Learning-Based Cooperative Perception for 3D Object Detection from Onboard-Roadside LiDAR
 459 | 
 460 | - **Background and Motivation**  
 461 | The paper introduces PillarGrid, a deep learning-based cooperative perception method for 3D object detection using both onboard and roadside LiDAR data. Traditional 3D object detection methods rely on single onboard LiDARs, which suffer from limitations in range and occlusion, especially in dense traffic. The motivation is to enhance detection accuracy and range by combining point cloud data from both vehicle-mounted and infrastructure-based sensors, improving detection performance in real-world scenarios.
 462 | 
 463 | - **Key Contributions**  
 464 |   - **PillarGrid Method**: Introduces a novel cooperative perception approach for 3D object detection that fuses data from onboard and roadside LiDAR sensors through deep learning.
 465 |   - **Grid-wise Feature Fusion (GFF)**: Proposes GFF, a feature-level fusion technique that combines information from multiple sensors to improve detection accuracy and reduce occlusion effects.
 466 |   - **Cooperative Preprocessing and Geo-Fencing**: Introduces cooperative preprocessing of point clouds and geo-fencing to align and process data from different sensors effectively.
 467 |   - **CNN-based 3D Object Detection**: Uses a convolutional neural network (CNN) to detect and generate 3D bounding boxes for objects, improving detection of vehicles and pedestrians.
 468 |   - **Dataset and Evaluation**: A new dataset, CARTI, was created using a cooperative perception platform for model training and evaluation, showing significant improvements over state-of-the-art methods in both accuracy and range.
 469 | 
 470 | ### DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection  [[paper](https://arxiv.org/abs/2204.05575)] [[code](https://github.com/AIR-THU/DAIR-V2X)] [[project](https://thudair.baai.ac.cn/index)]
 471 | 
 472 | - **Background and Motivation**  
 473 | Autonomous driving still faces significant challenges, especially in terms of long-range perception and global awareness. While vehicle sensors have limitations, combining vehicle and infrastructure data can overcome these challenges. However, there was a lack of real-world datasets for vehicle-infrastructure cooperative problems. This paper introduces the DAIR-V2X dataset to facilitate research in this area.
 474 | 
 475 | - **Key Contributions**  
 476 |   - **First Vehicle-Infrastructure Cooperative Dataset**: DAIR-V2X is the first large-scale, multi-modality dataset captured from real-world scenarios for vehicle-infrastructure cooperation in 3D object detection.
 477 |   - **VIC3D Task Definition**: The paper defines the VIC3D task, which focuses on collaboratively detecting and identifying 3D objects using sensory inputs from both vehicle and infrastructure.
 478 |   - **VIC3D Benchmark and Fusion Framework**: It introduces benchmarks for VIC3D object detection and proposes the Time Compensation Late Fusion (TCLF) framework to handle temporal asynchrony.
 479 |   - **Real-World Data**: The dataset includes 71k frames of LiDAR and camera data, collected across various environments with 3D annotations.
 480 |   - **Performance Improvement**: The results demonstrate that integrating vehicle and infrastructure data leads to better performance than using single-source data.
 481 | 
 482 | ### **DAIR-V2X-Seq**  V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting   [[paper](https://arxiv.org/abs/2305.05938)] [[code](https://github.com/AIR-THU/DAIR-V2X-Seq)] [[project](https://thudair.baai.ac.cn/index)]
 483 | 
 484 | - **Background and Motivation**  
 485 | The paper introduces the V2X-Seq dataset, addressing the need for real-world sequential datasets in vehicle-infrastructure cooperative perception and forecasting. Current research mainly focuses on improving perception using infrastructure data for frame-by-frame 3D detection, but lacks datasets for tracking and forecasting, which are critical for decision-making in autonomous driving.
 486 | 
 487 | - **Key Contributions**  
 488 |   - **Release of V2X-Seq**: The first large-scale, real-world sequential V2X dataset with data on vehicle and infrastructure cooperation.
 489 |   - **Two Main Parts**: Includes the sequential perception dataset with 15,000 frames from 95 scenarios, and the trajectory forecasting dataset with 210,000 scenarios.
 490 |   - **Three New Tasks**: Introduces VIC3D Tracking, Online-VIC Forecasting, and Offline-VIC Forecasting, with benchmarks to evaluate these tasks.
 491 |   - **Proposed FF-Tracking Method**: A middle fusion framework that efficiently solves the VIC3D tracking problem, handling latency challenges effectively.
 492 | 
 493 | ### HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative [[paper](https://arxiv.org/abs/2403.02640)] [~~code~~] [[project](https://holovic.net)]
 494 | 
 495 | - **Background and Motivation**  
 496 | The increasing complexity of traffic environments, including occlusions and blind spots, limits the effectiveness of single-viewpoint roadside sensing systems. To enhance the perception capabilities of roadside systems, the paper presents HoloVIC, a large-scale multi-sensor holographic dataset, aiming to improve vehicle-infrastructure cooperation (VIC) by capturing synchronized data from various sensors installed at different intersections.
 497 | 
 498 | - **Key Contributions**  
 499 |   - **HoloVIC Dataset**: A comprehensive multi-sensor dataset for vehicle-infrastructure cooperation, featuring data collected from holographic intersections equipped with multiple sensor layouts (Cameras, Lidar, Fisheye).
 500 |   - **Synchronized Multi-Sensor Data**: The dataset includes 100k+ synchronized frames with annotated 3D bounding boxes, covering diverse sensor configurations (e.g., 4C+2L, 12C+4F+2L).
 501 |   - **Benchmark and Tasks**: The authors formulate five key tasks (e.g., Mono3D, Lidar 3D Detection, Multi-sensor Multi-object Tracking, VIC Perception) to promote research on roadside perception and vehicle-infrastructure cooperation.
 502 |   - **High-Quality Annotations**: The dataset provides 3D bounding boxes and object IDs associated with different sensors, enabling the study of multi-sensor fusion and trajectory tracking across various intersection layouts.
 503 | 
 504 | ### OTVIC: A Dataset with Online Transmission for Vehicle-to-Infrastructure Cooperative 3D Object Detection  [[paper]()] [[code]()] [[project]()]
 505 | 
 506 | - **Background and Motivation**  
 507 | The paper introduces OTVIC, a real-world dataset designed for vehicle-to-infrastructure (V2I) cooperative 3D object detection. Traditional autonomous driving datasets often neglect the challenges of real-time perception data transmission from infrastructure to vehicles. OTVIC addresses these challenges by simulating the communication delays, bandwidth limitations, and high vehicle speeds encountered in real-world highway environments.
 508 | 
 509 | - **Key Contributions**  
 510 |   - **Real-Time Online Transmission**: OTVIC is the first multi-modality, multi-view dataset that includes online transmission from real-world scenarios, focusing on vehicle-to-infrastructure cooperative 3D object detection.
 511 |   - **Multi-Modal and Multi-View Dataset**: The dataset features synchronized data from multiple sensors, including images, LiDAR point clouds, and vehicle motion data, collected from highways at real-time speeds.
 512 |   - **Fusion Framework (LfFormer)**: The paper proposes LfFormer, a novel end-to-end multi-modality late fusion framework using transformer architecture, which proves to be effective and robust for cooperative 3D object detection.
 513 |   - **Real-World Relevance**: OTVIC emphasizes real-world challenges like varying transmission delays and environmental factors, offering a crucial resource for the development of real-time cooperative perception systems in autonomous driving.
 514 | 
 515 | ### DAIR-V2XReid: A New Real-World Vehicle-Infrastructure Cooperative Re-ID Dataset and Cross-Shot Feature Aggregation Network Perception Method  [[paper]()] [[code]()] [[project]()]
 516 | 
 517 | - **Background and Motivation**  
 518 | Vehicle Re-Identification (Re-ID) has a critical role in enhancing Vehicle-Infrastructure Cooperative Autonomous Driving (VICAD). Existing datasets are insufficient for evaluating the performance of Re-ID algorithms in real-world scenarios, particularly for cross-shot vehicle identification. The lack of a comprehensive dataset and effective models for addressing large variations in vehicle appearance across different cameras has hindered progress in VICAD research.
 519 | 
 520 | - **Key Contributions**  
 521 |   - **DAIR-V2XReid Dataset**: The paper introduces the DAIR-V2XReid dataset, the first real-world VIC Re-ID dataset, constructed using both vehicle-mounted and roadside cameras for vehicle re-identification tasks.  
 522 |   - **Cross-shot Feature Aggregation Network (CFA-Net)**: To address the issue of large appearance differences across different cameras, the CFA-Net model was proposed. It combines three key modules: a camera embedding module, a cross-stage feature fusion module, and a multi-directional attention module.  
 523 |   - **State-of-the-art Performance**: The proposed CFA-Net achieves the highest reported performance on the DAIR-V2XReid dataset, significantly improving the accuracy of vehicle Re-ID across different camera views.  
 524 |   - **Versatile Application**: The model demonstrates good generalization abilities, as evidenced by experiments on the VeRi776 dataset, further confirming its robustness and efficiency for real-world applications.
 525 | 
 526 | ### TUMTraf-V2X: Cooperative Perception Dataset for 3D Object Detection in Driving Scenarios [[paper](https://arxiv.org/abs/2403.01316)] [[code](https://github.com/tum-traffic-dataset/tum-traffic-dataset-dev-kit)] [[project](https://tum-traffic-dataset.github.io/tumtraf-v2x)]
 527 | 
 528 | - **Background and Motivation**  
 529 | This paper presents the TUMTraf-V2X dataset, which aims to enhance vehicle perception through cooperative sensing. Roadside sensors and onboard sensors are used to overcome the limitations of single-sensor systems, especially occlusion and limited field of view. The dataset focuses on 3D object detection and tracking to improve road safety and autonomous driving capabilities.
 530 | 
 531 | - **Key Contributions**  
 532 |   - **High-Quality V2X Dataset**: The dataset includes 2,000 labeled point clouds and 5,000 labeled images with 30,000 3D bounding boxes. It covers challenging traffic scenarios, including near-miss events and traffic violations.
 533 |   - **Cooperative 3D Object Detection Model (CoopDet3D)**: A new cooperative fusion model, CoopDet3D, which outperforms traditional camera-LiDAR fusion methods with a +14.3 3D mAP improvement.
 534 |   - **Open-Source Tools and Resources**: The dataset and related tools, such as the 3D BAT labeling tool and a development kit, are made publicly available to facilitate integration and model training.
 535 |   - **Benchmarking and Evaluation**: Extensive experiments show that cooperative perception models lead to better detection accuracy than vehicle-only systems, proving the benefit of V2X collaboration in object detection tasks.
 536 | 
 537 | ### V2X-DSI: A Density-Sensitive Infrastructure LiDAR Benchmark for Economic Vehicle-to-Everything Cooperative Perception
 538 | 
 539 | - **Background and Motivation**  
 540 | The paper addresses the high costs associated with the deployment of infrastructure LiDAR sensors in large-scale Vehicle-to-Everything (V2X) cooperative perception systems. It proposes a new benchmark, V2X-DSI, to explore the economic feasibility of using lower-beam infrastructure LiDAR sensors for cooperative perception. This is crucial as the deployment of high-beam LiDAR sensors on numerous infrastructures is prohibitively expensive, limiting the widespread adoption of V2X systems.
 541 | 
 542 | - **Key Contributions**  
 543 |   - **V2X-DSI Benchmark**: Introduces the first Density-Sensitive Infrastructure LiDAR benchmark, V2X-DSI, designed for economic V2X cooperative perception using LiDAR sensors with varying beam densities (16-beam, 32-beam, 64-beam, 128-beam).
 544 |   - **Performance Analysis**: Analyzes the impact of different beam densities on cooperative perception performance, using three state-of-the-art methods: OPV2V, V2X-ViT, and CoBEVT.
 545 |   - **Simulated Scenarios**: Utilizes a large-scale simulation in CARLA, including 56,984 frames from 57 diverse urban scenarios, to evaluate the performance of V2X cooperative perception systems.
 546 |   - **Fine-tuning for Low-Beam LiDAR**: Demonstrates that models trained on high-beam LiDAR can be fine-tuned to improve performance when deployed on low-beam LiDAR, mitigating the performance drop in real-world low-beam scenarios.
 547 |   - **Cost-Effective Deployment**: Provides a solution for reducing costs by using lower-beam LiDAR sensors without significantly compromising detection accuracy in urban traffic scenarios.
 548 | 
 549 | ### V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception  [[paper](https://arxiv.org/abs/2411.10962)] [[code](https://github.com/yanglei18/V2X-Radar)] [[project](http://openmpd.com/column/V2X-Radar)]
 550 | 
 551 | - **Background and Motivation**  
 552 | The V2X-Radar dataset was developed to address the limitations in existing cooperative perception datasets, which often focus solely on camera and LiDAR data. The goal is to bridge the gap by incorporating 4D Radar, a sensor that excels in adverse weather conditions. The dataset aims to enhance the robustness of autonomous driving perception systems by addressing occlusions and limited range in single-vehicle systems.
 553 | 
 554 | - **Key Contributions**  
 555 |   - **First Real-World Multi-modal Dataset**: V2X-Radar is the first large, real-world dataset incorporating 4D Radar, alongside LiDAR and camera data.
 556 |   - **Comprehensive Data Coverage**: It includes 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, with 350K annotated bounding boxes spanning five object categories.
 557 |   - **Three Specialized Sub-datasets**: The dataset is divided into V2X-Radar-C (cooperative perception), V2X-Radar-I (roadside perception), and V2X-Radar-V (single-vehicle perception).
 558 |   - **Extensive Benchmarking**: The dataset includes benchmarks for recent perception algorithms across these sub-datasets, supporting a wide range of research in cooperative perception.
 559 | 
 560 | ## :bookmark:V2X Datasets <a id="v2x-datasets-anchor"></a>
 561 | 
 562 | - **V2X Datasets**: Covering vehicle-to-everything communication, these datasets integrate multiple agents such as vehicles, infrastructure, and other environmental elements like drones or pedestrians, enabling research in complex collaborative scenarios.
 563 | 
 564 | ### Table
 565 | 
 566 | | **Dataset** | **Year** | **Venue** | **Sensors** | **Source** | **Tasks** | **download** |
 567 | |-------------|----------|-----------|-------------|------------|-----------|----------|
 568 | | V2X-Sim | 2022 | RA-L | L, C | Sim | 3DOD, MOT, SS | [download](https://ai4ce.github.io/V2X-Sim/download.html) |
 569 | | V2XSet | 2022 | ECCV | L, C | Sim | 3DOD | [download](https://paperswithcode.com/dataset/v2xset) |
 570 | | DOLPHINS | 2022 | ACCV | L, C | Sim | 2DOD, 3DOD | [download](https://dolphins-dataset.net/) |
 571 | | DeepAccident | 2024 | AAAI | L, C | Sim | 3DOD, MOT, SS, TP | [download](https://deepaccident.github.io/) |
 572 | | V2X-Real | 2024 | ECCV | L, C | Real | 3DOD | [download](https://mobility-lab.seas.ucla.edu/v2x-real) |
 573 | | Multi-V2X | 2024 | arxiv | L, C | Sim | 3DOD, MOT | [download](http://github.com/RadetzkyLi/Multi-V2X) |
 574 | | Adver-City | 2024 | arxiv | L, C | Sim | 3DOD, MOT, SS | [download](https://labs.cs.queensu.ca/quarrg/datasets/adver-city/) |
 575 | | DAIR-V2X-Traj | 2024 | NIPS | L, C | Real | MF | [download](https://github.com/AIR-THU/V2X-Graph) |
 576 | | WHALES | 2024 | arxiv | L, C | Sim | 3DOD | [download](https://github.com/chensiweiTHU/WHALES) |
 577 | | V2X-R | 2024 | arxiv | L, C, R | Sim | 3DOD | [download](https://github.com/ylwhxht/V2X-R) |
 578 | | V2XPnP-Seq | 2024 | arxiv | L, C | Real | Perception and Prediction | [download](https://mobility-lab.seas.ucla.edu/v2xpnp/) |
 579 | | Mixed Signals | 2025 | arxiv | L | Real | 3DOD | [download](https://mixedsignalsdataset.cs.cornell.edu/) |
 580 | | SCOPE | 2024 | arxiv | C, L | Sim | 2DOD, 3DOD, SS, S2R | [download](https://ekut-es.github.io/scope) |
 581 | 
 582 | Note: Sensors: Camera (C), LiDAR (L), Radar (R). Source: Real = collected in the real world; Sim = generated via simulation. Tasks: 2DOD = 2D Object Detection, 3DOD = 3D Object Detection, MOT = Multi-Object Tracking, MTSCT = Multi-target Single-camera Tracking, MTMCT = Multi-target Multi-camera Tracking, SS = Semantic Segmentation, TP = Trajectory Prediction, VPR = Visual Place Recognition, NR = Neural Reconstruction, Re-ID = Re-Identification, S2R = Sim2Real, MF = Motion Forecasting, PQA = Planning Q&A.
 583 | 
 584 | ### V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving [[paper](https://arxiv.org/abs/2202.08449)] [[code](https://github.com/ai4ce/V2X-Sim)] [[project](https://ai4ce.github.io/V2X-Sim)]
 585 | 
 586 | - **Background and Motivation**  
 587 | The paper addresses the limitations of single-vehicle perception in autonomous driving, particularly regarding occlusions and limited sensing range. Vehicle-to-Everything (V2X) communication enhances this by enabling collaboration among multiple agents, allowing for a broader and clearer understanding of the environment. This dataset and benchmark aim to fill the gap in the field by providing the first public multi-agent, multi-modality collaborative perception dataset, facilitating research in collaborative perception tasks.
 588 | 
 589 | - **Key Contributions**  
 590 |   - **V2X-Sim Dataset**: The paper introduces the V2X-Sim dataset, which supports multi-agent, multi-modality, and multi-task perception tasks, enabling research in collaborative perception for autonomous driving.
 591 |   - **Open-Source Testbed**: It provides an open-source testbed for testing collaborative perception methods, offering a benchmark for three critical tasks: collaborative detection, tracking, and segmentation.
 592 |   - **Comprehensive Sensor Suite**: The dataset includes various sensor modalities from both vehicles and road-side units (RSUs), enhancing perception across different environments.
 593 |   - **Collaborative Perception Strategies**: The study uses state-of-the-art collaboration strategies to evaluate the dataset and advance collaborative perception research .
 594 | 
 595 | ### **V2XSet**   V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer [[paper](https://arxiv.org/abs/2203.10638)] [[code](https://github.com/DerrickXuNu/v2x-vit)] [[project](https://drive.google.com/drive/folders/1r5sPiBEvo8Xby-nMaWUTnJIPK6WhY1B6)]
 596 | 
 597 | - **Background and Motivation**  
 598 | The paper discusses the challenges of autonomous vehicle (AV) perception in complex driving environments, particularly when vehicles suffer from occlusions and limited sensor range. Vehicle-to-Everything (V2X) communication, including collaboration with infrastructure, is introduced as a solution to enhance perception. However, integrating information from heterogeneous agents like vehicles and infrastructure presents unique challenges, which this paper aims to address.
 599 | 
 600 | - **Key Contributions**  
 601 |   - **Unified V2X Vision Transformer (V2X-ViT)**: Introduces a novel Transformer architecture for cooperative perception that handles the challenges of heterogeneous V2X systems, such as sensor misalignment, noise, and asynchronous data sharing.
 602 |   - **Heterogeneous Multi-Agent Attention Module**: A customized attention mechanism designed to account for the different agent types (vehicles and infrastructure) and their connections during the feature fusion process.
 603 |   - **Multi-Scale Window Attention**: A multi-resolution attention mechanism that helps mitigate the effects of localization errors and time delays in the data from agents.
 604 |   - **V2XSet Dataset**: The creation of a large-scale, open dataset designed to simulate real-world communication conditions and evaluate the V2X-ViT framework.
 605 | 
 606 | ### DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving [[paper](https://arxiv.org/abs/2207.07609)] [[code](https://github.com/explosion5/Dolphins)] [[project](https://dolphins-dataset.net)]
 607 | 
 608 | - **Background and Motivation**  
 609 | The paper addresses the limitations of standalone perception in autonomous driving, such as blind zones and occlusions. To overcome this, Vehicle-to-Everything (V2X) communication enables collaborative perception through data sharing between vehicles and road-side units (RSUs). However, a lack of large-scale, multi-view, and multi-modal datasets has hindered the development of collaborative perception algorithms, motivating the creation of the DOLPHINS dataset.
 610 | 
 611 | - **Key Contributions**  
 612 |   - **DOLPHINS Dataset**: A large-scale dataset featuring various autonomous driving scenarios with multi-view, multi-modality data (images and point clouds) from both vehicles and RSUs.
 613 |   - **Diverse Scenarios and High Resolution**: The dataset includes 6 typical driving scenarios, such as urban intersections, highways, and mountain roads, with high-resolution data (Full-HD images and 64-line LiDARs).
 614 |   - **Temporally-Aligned Data**: The data from multiple viewpoints (vehicles and RSUs) are temporally aligned, enabling comprehensive collaborative perception.
 615 |   - **Benchmark for Collaborative Perception**: The paper provides a benchmark for 2D and 3D object detection, as well as multi-view collaborative perception tasks, demonstrating the effectiveness of V2X communication in improving detection accuracy and reducing sensor costs.
 616 | 
 617 | ### V2X-Real: A Large-Scale Dataset for Vehicle-to-Everything Cooperative Perception [[paper](https://arxiv.org/abs/2403.16034)] [~~code~~] [[project](https://mobility-lab.seas.ucla.edu/v2x-real)]
 618 | 
 619 | - **Background and Motivation**  
 620 | The paper addresses the limitations in current autonomous driving datasets, specifically in Vehicle-to-Everything (V2X) cooperative perception. Existing datasets primarily focus on Vehicle-to-Vehicle (V2V) or Vehicle-to-Infrastructure (V2I) collaboration, but real-world V2X datasets with both multi-vehicle and infrastructure collaboration are scarce. The need for real-world data to improve V2X perception systems, particularly for handling occlusions and expanding the perception range, is the central motivation.
 621 | 
 622 | - **Key Contributions**  
 623 |   - **Introduction of V2X-Real**: The first open, large-scale real-world dataset for V2X cooperative perception, featuring multi-modal sensing data from LiDAR and cameras.
 624 |   - **Four Sub-Datasets**: Divided into Vehicle-Centric, Infrastructure-Centric, Vehicle-to-Vehicle, and Infrastructure-to-Infrastructure cooperative perception datasets, tailored for different collaboration modes.
 625 |   - **Large-Scale Annotations**: Includes over 1.2 million annotated bounding boxes for 10 object categories, providing detailed data for multi-agent, multi-class cooperative 3D object detection tasks.
 626 |   - **Comprehensive Benchmarks**: Provides benchmarks for state-of-the-art cooperative perception methods, enhancing research on V2X interaction in complex urban environments.
 627 |   - **High-Density Traffic Data**: Collected in challenging urban environments with dense traffic, making it suitable for testing cooperative perception algorithms in real-world scenarios. 
 628 | 
 629 | ### DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving [[paper](https://arxiv.org/abs/2304.01168)] [[code](https://github.com/tianqi-wang1996/DeepAccident)] [[project](https://deepaccident.github.io)]
 630 | 
 631 | - **Background and Motivation**  
 632 | The paper introduces DeepAccident, the first large-scale V2X dataset designed for motion and accident prediction in autonomous driving. While existing datasets mainly focus on perception tasks, they lack real-world accident scenarios and safety evaluations. The motivation is to provide a comprehensive dataset that includes safety-critical scenarios and supports end-to-end motion and accident prediction tasks for autonomous driving.
 633 | 
 634 | - **Key Contributions**  
 635 |   - **DeepAccident Dataset**: The first V2X dataset supporting motion and accident prediction, with 57K annotated frames and 285K annotated samples, offering a diverse range of accident scenarios.
 636 |   - **End-to-End Accident Prediction**: A novel task that predicts the occurrence, timing, location, and involved vehicles or pedestrians in accidents using raw sensor data.
 637 |   - **V2XFormer Model**: A baseline model demonstrating superior performance over single-vehicle models in motion and accident prediction, and 3D object detection tasks.
 638 |   - **V2X Research for Safety**: Enables V2X-based research in perception and prediction, addressing the gap in safety-critical scenario evaluation for autonomous driving algorithms.
 639 | 
 640 | ### Adver-City: Open-Source Multi-Modal Dataset for Collaborative Perception Under Adverse Weather Conditions[[paper](https://arxiv.org/abs/2410.06380)] [[code](https://github.com/QUARRG/Adver-City)] [[project](https://labs.cs.queensu.ca/quarrg/datasets/adver-city)]
 641 | 
 642 | - **Background and Motivation**  
 643 | Adverse weather conditions like rain, fog, and glare challenge the performance of Autonomous Vehicles (AVs) and Collaborative Perception (CP) models. The lack of datasets focusing on these conditions limits the evaluation and improvement of CP under such scenarios. Adver-City aims to address this gap by providing the first open-source CP dataset under diverse adverse weather conditions, including glare, a first for synthetic CP datasets.
 644 | 
 645 | - **Key Contributions**  
 646 |   - **First Open-Source CP Dataset for Adverse Weather**: Adver-City is the first synthetic, open-source CP dataset focused on weather conditions like rain, fog, and glare.
 647 |   - **Varied Weather and Object Density Scenarios**: The dataset includes 110 scenarios across six weather conditions and varying object densities to simulate real-world challenges in adverse weather.
 648 |   - **Multi-Sensor Data**: It includes data from LiDARs, RGB cameras, semantic segmentation cameras, GNSS, and IMUs, supporting tasks such as 3D object detection, tracking, and semantic segmentation.
 649 |   - **Realistic Scenario Design**: Scenarios are based on real crash reports, improving relevance for autonomous driving research.
 650 |   - **Benchmarking**: Performance benchmarks highlight the challenges posed by adverse weather on multi-modal object detection models, with performance drops observed in certain conditions.
 651 | 
 652 | ### Multi-V2X: A Large Scale Multi-modal Multi-penetration-rate Dataset for Cooperative Perception [[paper](https://arxiv.org/abs/2409.04980)] [[code](https://github.com/RadetzkyLi/Multi-V2X)] [~~project~~]
 653 | 
 654 | - **Background and Motivation**  
 655 | The paper introduces the Multi-V2X dataset to address limitations in existing datasets for cooperative perception, particularly the lack of sufficient communicating agents and consideration of CAV penetration rates. Existing real-world datasets offer limited interaction, and synthetic datasets omit vulnerable road users like cyclists and pedestrians, essential for safe autonomous driving.
 656 | 
 657 | - **Key Contributions**  
 658 |   - **First Multi-Penetration-Rate Dataset**: Multi-V2X is the first dataset that supports varying CAV penetration rates, providing a realistic training environment for cooperative perception systems with up to 86.21% CAV penetration.
 659 |   - **Large-Scale Data**: The dataset includes 549k RGB frames, 146k LiDAR frames, and 4.2 million annotated 3D bounding boxes across six categories, supporting diverse training scenarios.
 660 |   - **Co-Simulation of SUMO and CARLA**: By co-simulating SUMO for traffic flow and CARLA for sensor simulation, the dataset captures realistic data for autonomous driving research.
 661 |   - **Comprehensive Benchmarks**: The dataset includes benchmarks for cooperative 3D object detection tasks, enabling the development of algorithms that perform under various penetration rates and cooperative settings.
 662 | 
 663 | ### **DAIR-V2X-Traj**  Learning Cooperative Trajectory Representations for Motion Forecasting [[paper](https://arxiv.org/abs/2311.00371)] [[code](https://github.com/AIR-THU/V2X-Graph)] [[project](https://thudair.baai.ac.cn/index)]
 664 | 
 665 | - **Background and Motivation**  
 666 | The paper addresses the challenge of enhancing motion forecasting for autonomous driving by incorporating information from external sources, such as connected vehicles and infrastructure. Traditional methods focus on single-frame cooperative data, often underutilizing the rich motion and interaction context available from cooperative trajectories.
 667 | 
 668 | - **Key Contributions**  
 669 |   - **V2X-Graph Framework**: Introduces a novel graph-based framework for cooperative motion forecasting that fuses motion and interaction features from different sources, improving prediction accuracy.
 670 |   - **Forecasting-Oriented Representation Paradigm**: Proposes a new representation paradigm that uses cooperative trajectory data to enhance forecasting capabilities by considering motion and interaction features.
 671 |   - **V2X-Traj Dataset**: Develops the first real-world V2X dataset for motion forecasting, which includes both vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) cooperation in every scenario.
 672 |   - **State-of-the-Art Performance**: Demonstrates that the proposed method outperforms existing approaches on the V2X-Seq and V2X-Traj datasets, highlighting the effectiveness of cooperative data fusion in motion forecasting.
 673 | 
 674 | ### WHALES: A Multi-Agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving [[paper](https://arxiv.org/abs/2411.13340)] [[code](https://github.com/chensiweiTHU/WHALES)] [[project](https://pan.baidu.com/s/1dintX-d1T-m2uACqDlAM9A)]
 675 | 
 676 | - **Background and Motivation**  
 677 | The paper presents the WHALES dataset, aiming to address the limitations in current cooperative perception datasets for autonomous driving. Existing datasets often involve a limited number of agents, reducing the effectiveness of multi-agent cooperation. WHALES tackles this by simulating environments with up to 8.4 agents per sequence, enhancing the scope for studying cooperative perception and agent scheduling.
 678 | 
 679 | - **Key Contributions**  
 680 |   - **Large-Scale Multi-Agent Dataset**: WHALES features 70K RGB images, 17K LiDAR frames, and 2.01M 3D bounding box annotations, capturing cooperative perception with multiple agents.
 681 |   - **Innovative Agent Scheduling**: Introduces the concept of agent scheduling in cooperative perception, a novel task not explored in previous datasets, optimizing cooperation for perception gains.
 682 |   - **Simulated Scenarios**: The dataset includes diverse road scenarios, including intersections and highway ramps, enabling more comprehensive research in cooperative driving.
 683 |   - **Benchmarking Tasks**: Provides benchmarks for 3D object detection and agent scheduling, supporting the development of algorithms for multi-agent cooperation in autonomous driving systems.
 684 | 
 685 | ###  **V2X-R**  V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection [[paper](https://arxiv.org/abs/2411.08402)] [[code](https://github.com/ylwhxht/V2X-R)] [~~project~~]
 686 | 
 687 | - **Background and Motivation**  
 688 | The paper introduces the V2X-R dataset, a novel contribution to Vehicle-to-Everything (V2X) cooperative perception. Current datasets, which mainly focus on LiDAR and camera data, struggle under adverse weather conditions. The addition of 4D radar data, known for its weather robustness, aims to enhance 3D object detection in such challenging environments.
 689 | 
 690 | - **Key Contributions**  
 691 |   - **First V2X-R Dataset**: The first simulated V2X dataset that integrates LiDAR, camera, and 4D radar data, addressing weather robustness in cooperative perception.
 692 |   - **Cooperative LiDAR-4D Radar Fusion Pipeline**: Proposes a novel fusion pipeline that improves 3D object detection by combining data from multiple agents, LiDAR, and 4D radar sensors.
 693 |   - **Multi-modal Denoising Diffusion (MDD)**: Introduces a denoising diffusion module that uses 4D radar to guide the denoising process of noisy LiDAR data in adverse weather conditions, improving the detection accuracy.
 694 |   - **Comprehensive Benchmarking**: Establishes a benchmark using various fusion strategies and models, demonstrating the superior performance of the proposed approach in both normal and adverse weather conditions.
 695 | 
 696 | ### V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction [[paper](https://arxiv.org/abs/2412.01812)] [[code](https://github.com/Zewei-Zhou/V2XPnP)] [[project](https://mobility-lab.seas.ucla.edu/v2xpnp)]
 697 | 
 698 | - **Background and Motivation**  
 699 | The paper addresses limitations in current Vehicle-to-Everything (V2X) cooperative perception systems that ignore temporal relationships across frames. While previous research focuses on single-frame perception, there is a lack of frameworks that handle temporal cues for perception and prediction. This work aims to integrate spatio-temporal information to improve cooperative perception and prediction in V2X environments, where vehicles and infrastructures share data to overcome occlusions and enhance safety.
 700 | 
 701 | - **Key Contributions**  
 702 |   - **V2XPnP Framework**: Introduces a novel spatio-temporal fusion framework using a Transformer architecture for end-to-end perception and prediction tasks, integrating temporal, spatial, and map information.
 703 |   - **First Real-World V2X Sequential Dataset**: Presents the V2XPnP Sequential Dataset, supporting all V2X collaboration modes (V2V, V2I, I2I), including 100 scenarios with diverse agent interactions and temporal consistency.
 704 |   - **Spatio-Temporal Fusion Strategies**: Proposes advanced fusion strategies—early, late, and intermediate fusion—incorporating temporal information for enhanced perception and prediction performance.
 705 |   - **Superior Performance**: Demonstrates that the V2XPnP framework significantly outperforms existing methods in both perception and prediction tasks, with a notable improvement in the EPA (End-to-End Perception and Prediction Accuracy) metric.
 706 | 
 707 | ### Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration [[paper](https://arxiv.org/abs/2502.14156)] [[code](https://github.com/chinitaberrio/Mixed-Signals)] [[project](https://mixedsignalsdataset.cs.cornell.edu)]
 708 | 
 709 | - **Background and Motivation**  
 710 | The paper introduces the Mixed Signals dataset, designed to address the gaps in existing V2X datasets. While many datasets focus on homogeneous sensor setups from identical vehicles, the Mixed Signals dataset aims to capture the complexities of real-world V2X collaboration by including heterogeneous LiDAR configurations and left-hand driving scenarios, providing data for robust multi-agent perception tasks.
 711 | 
 712 | - **Key Contributions**  
 713 |   - **Heterogeneous LiDAR Configurations**: First V2X dataset featuring three vehicles with two different LiDAR sensors and a roadside unit with dual LiDARs, adding complexity to real-world V2X collaboration.
 714 |   - **Diverse Traffic Participants**: Includes 10 classes of objects, including 4 categories of vulnerable road users (VRUs), with over 240K 3D bounding box annotations.
 715 |   - **High-Quality Data Collection**: Precise synchronization and localization techniques for accurate sensor alignment, ensuring high-quality, annotated data that can be used for perception training and benchmarking.
 716 |   - **Real-World Scenario**: Captures data from real-world scenarios in Sydney, Australia, with left-hand traffic, increasing the dataset's applicability globally.
 717 |   - **Benchmarking and Evaluation**: Provides extensive benchmarks for collaborative object detection, enabling the development of advanced V2X perception methods.
 718 | 
 719 | ### SCOPE: A Synthetic Multi-Modal Dataset for Collective Perception Including Physical-Correct Weather Conditions
 720 | 
 721 | - **Background and Motivation**  
 722 | The paper introduces the SCOPE dataset, designed to address gaps in current datasets for collective perception (CP) in autonomous driving. Existing datasets lack scenario diversity, realistic sensor models, and environmental conditions like adverse weather. SCOPE is created to support the development and testing of collective perception algorithms, especially in challenging real-world conditions.
 723 | 
 724 | - **Key Contributions**  
 725 |   - **First Synthetic Multi-Modal CP Dataset**: SCOPE is the first synthetic dataset that combines realistic LiDAR and camera models with physically-accurate weather simulations for both sensors.
 726 |   - **Diverse Scenarios**: The dataset includes 17,600 frames from over 40 different scenarios, including edge cases like tunnels and roundabouts, with up to 24 collaborative agents.
 727 |   - **Weather Simulations**: SCOPE introduces realistic rain and fog simulations with parameterized intensities, improving robustness against environmental effects on perception.
 728 |   - **Wide Sensor Setup**: The dataset uses a variety of sensors, including RGB and SemSeg cameras, along with three different LiDAR models, to enable comprehensive object detection and semantic segmentation tasks.
 729 |   - **Novel Maps**: It includes two novel digital-twin maps from Karlsruhe and Tübingen, enhancing the dataset’s real-world applicability.
 730 | 
 731 | ### **OPV2V-N**  RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-based 3D Neural Modeling [[paper](https://arxiv.org/abs/2405.16868)] [~~code~~] [~~project~~]
 732 | 
 733 | - **Background and Motivation**  
 734 | The paper introduces RCDN, a solution to overcome the issue of noisy or failed camera perspectives in multi-agent collaborative perception. In real-world settings, cameras can be blurred, noisy, or even fail, severely affecting the performance of collaborative perception systems. RCDN aims to recover perceptual messages from failed camera perspectives using dynamic feature-based 3D neural modeling, ensuring robust collaborative performance with low calibration costs.
 735 | 
 736 | - **Key Contributions**  
 737 |   - **Introduction of RCDN**: A robust camera-insensitivity collaborative perception system that uses dynamic feature-based 3D neural modeling to recover failed perceptual information.
 738 |   - **Collaborative Neural Rendering**: RCDN constructs collaborative neural rendering field representations to stabilize high collaborative performance even in the presence of noisy or failed cameras.
 739 |   - **Two Collaborative Fields**: Introduces time-invariant static and time-varying dynamic fields for collaborative perception, enhancing the system’s ability to handle camera failures and recover from noisy inputs.
 740 |   - **New Dataset - OPV2V-N**: Introduces OPV2V-N, a large-scale dataset with manually labeled data, simulating various camera failure scenarios for better research on camera-insensitive collaborative perception.
 741 |   - **Improved Robustness**: Demonstrates that RCDN significantly enhances the performance of existing collaborative perception methods, improving their robustness by up to 157.91% under extreme camera-insensitivity conditions.
 742 | 
 743 | ### **CP-GuardBench**  CP-Guard+: A New Paradigm for Malicious Agent Detection and Defense in Collaborative Perception [[paper&review](https://openreview.net/forum?id=9MNzHTSDgh)] [~~code~~] [~~project~~]
 744 | 
 745 | - **Background and Motivation**  
 746 | The paper introduces CP-Guard+, a solution for enhancing the security of collaborative perception (CP) systems in autonomous driving. While CP systems enable vehicles to share sensory data for improved perception, they are vulnerable to malicious agents that may inject adversarial data, potentially compromising safety. The motivation behind CP-Guard+ is to create a robust, computationally efficient framework that can detect and defend against such malicious agents, mitigating risks in autonomous driving environments.
 747 | 
 748 | - **Key Contributions**  
 749 |   - **Feature-Level Malicious Agent Detection**: Proposes a novel approach for detecting malicious agents directly at the feature level, eliminating the need for computationally expensive hypotheses and verifications.
 750 |   - **CP-GuardBench Dataset**: Introduces the first benchmark dataset, CP-GuardBench, designed specifically for training and evaluating malicious agent detection methods in CP systems.
 751 |   - **CP-Guard+ Defense Framework**: Develops CP-Guard+, a defense mechanism that improves malicious agent detection by enhancing the separability of benign and malicious features using a Dual-Centered Contrastive Loss (DCCLoss).
 752 |   - **Superior Performance**: Demonstrates through extensive experiments on both CP-GuardBench and V2X-Sim datasets that CP-Guard+ outperforms traditional defense methods, achieving high accuracy and low false positive rates while significantly reducing computational overhead.
 753 |   - **Efficiency and Scalability**: CP-Guard+ achieves a substantial increase in frames per second (FPS) compared to existing methods, proving its efficiency in real-time CP systems.
 754 | 
 755 | ### V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality [[paper](https://arxiv.org/abs/2503.10034)] [~~code~~] [~~project~~]
 756 | 
 757 | - **Background and Motivation**  
 758 | The paper addresses the challenges of applying Vehicle-to-Everything (V2X) cooperative perception in real-world conditions. Previous research often focuses on simulations or static datasets, which fail to capture dynamic, real-time conditions such as communication latency and sensor misalignment. V2X-ReaLO aims to bridge this gap by providing a practical, real-world framework that demonstrates the feasibility of real-time intermediate fusion.
 759 | 
 760 | - **Key Contributions**  
 761 |   - **V2X-ReaLO Framework**: Introduces an open online framework for cooperative perception deployed on real vehicles and smart infrastructure, integrating early, late, and intermediate fusion methods.
 762 |   - **First Practical Demonstration**: Provides the first real-world demonstration of the feasibility and performance of intermediate fusion under real-world conditions.
 763 |   - **Open Online Benchmark Dataset**: Extends the V2X-Real dataset to dynamic, synchronized ROS bags with 25,028 frames, including 6,850 annotated key frames for real-time evaluation in challenging urban scenarios.
 764 |   - **Real-Time Evaluation**: Supports online evaluations, accounting for real-world challenges such as bandwidth limitations, latency, and asynchronous message arrival.
 765 |   - **Comprehensive Benchmarking**: Conducts extensive benchmarks to assess multi-class, multi-agent V2X cooperative perception performance, highlighting the system's effectiveness in various collaboration modes (V2V, V2I, and I2I).
 766 | 
 767 | ## :bookmark:I2I Datasets <a id="i2i-datasets-anchor"></a>
 768 | 
 769 | - **I2I Datasets**: Focused on infrastructure-to-infrastructure collaboration, these datasets support research in scenarios with overlapping sensor coverage or distributed sensor fusion across intersections.
 770 | 
 771 | ### Table
 772 | 
 773 | | **Dataset** | **Year** | **Venue** | **Sensors** | **Source** | **Tasks** | **download** |
 774 | |-------------|----------|-----------|-------------|------------|-----------|----------|
 775 | | Rcooper | 2024 | CVPR | C, L | Real | 3DOD, MOT | [download](https://github.com/AIR-THU/DAIR-Rcooper) |
 776 | | InScope | 2024 | arxiv | L | Real | 3DOD, MOT | [download](https://github.com/xf-zh/InScope) |
 777 | 
 778 | Note: Sensors: Camera (C), LiDAR (L), Radar (R). Source: Real = collected in the real world; Sim = generated via simulation. Tasks: 2DOD = 2D Object Detection, 3DOD = 3D Object Detection, MOT = Multi-Object Tracking, MTSCT = Multi-target Single-camera Tracking, MTMCT = Multi-target Multi-camera Tracking, SS = Semantic Segmentation, TP = Trajectory Prediction, VPR = Visual Place Recognition, NR = Neural Reconstruction, Re-ID = Re-Identification, S2R = Sim2Real, MF = Motion Forecasting, PQA = Planning Q&A.
 779 | 
 780 | 
 781 | ### RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception  [[paper](https://arxiv.org/abs/2403.10145)] [[code](https://github.com/AIR-THU/DAIR-RCooper)] [[project](https://www.t3caic.com/qingzhen)]
 782 | 
 783 | - **Background and Motivation**  
 784 | The paper introduces RCooper, the first real-world, large-scale dataset for roadside cooperative perception. Existing roadside perception systems focus on independent sensors, leading to limited sensing range and blind spots. Roadside cooperative perception (RCooper) aims to enhance traffic monitoring and autonomous driving by using data from multiple roadside sensors to overcome these limitations, providing more comprehensive coverage.
 785 | 
 786 | - **Key Contributions**  
 787 |   - **First Real-World RCooper Dataset**: RCooper is the first large-scale dataset dedicated to roadside cooperative perception, including 50k images and 30k point clouds with manual annotations.
 788 |   - **Two Main Traffic Scenes**: The dataset includes two representative traffic scenes: intersections and corridors, capturing diverse traffic flow and environmental conditions.
 789 |   - **Challenges in Roadside Perception**: The dataset addresses challenges such as data heterogeneity, sensor alignment, and cooperative representation for roadside systems.
 790 |   - **Cooperative Detection and Tracking Tasks**: RCooper provides benchmarks for two tasks—3D object detection and tracking—using multi-agent cooperation, with state-of-the-art methods included for comparison.
 791 |   - **Comprehensive Annotation and Data Diversity**: The dataset is annotated with 3D bounding boxes and trajectories across ten semantic classes and includes variations in weather and lighting conditions.
 792 | 
 793 | ### InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios  [[paper](https://arxiv.org/abs/2407.21581)] [[code](https://github.com/xf-zh/InScope)] [~~project~~]
 794 | 
 795 | - **Background and Motivation**  
 796 | The paper introduces the InScope dataset, aiming to address the issue of occlusion in vehicle-centric perception systems. Infrastructure-side perception systems (IPS) are suggested to complement autonomous vehicles, providing broader coverage. However, the lack of real-world 3D infrastructure-side datasets limits the progress in V2X technologies. InScope aims to bridge this gap by capturing occlusion challenges and providing collaborative perception data.
 797 | 
 798 | - **Key Contributions**  
 799 |   - **First Large-Scale Infrastructure-Side Collaborative Dataset**: InScope is the first 3D infrastructure-side dataset designed to handle occlusion challenges with multi-position LiDARs.
 800 |   - **Comprehensive Data Collection**: Includes 303 tracking trajectories and 187,787 3D bounding box annotations captured over 20 days in open traffic scenarios.
 801 |   - **Four Key Benchmarks**: The dataset provides benchmarks for 3D object detection, multi-source data fusion, data domain transfer, and 3D multi-object tracking tasks.
 802 |   - **Anti-Occlusion Evaluation Metric**: Introduces a new metric (𝜉𝐷) to evaluate the anti-occlusion capabilities of detection methods, quantifying detection degradation ratios between single and multi-LiDAR setups.
 803 |   - **Enhanced Perception**: The dataset significantly enhances performance in detecting and tracking occluded, small, and distant objects, which are critical for real-world traffic safety.
 804 | 
 805 | ## 其他协同感知数据集论文
 806 | - **Griffin** (Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark) [[paper](https://arxiv.org/abs/2503.06983)] [[code](https://github.com/wang-jh18-SVM/Griffin)] [[project](https://pan.baidu.com/s/1NDgsuHB-QPRiROV73NRU5g)]
 807 | - **RLS** (Analyzing Infrastructure LiDAR Placement with Realistic LiDAR Simulation Library) [[paper](https://arxiv.org/abs/2211.15975)] [[code](https://github.com/PJLab-ADG/LiDARSimLib-and-Placement-Evaluation)] [~~project~~]
 808 | - **Roadside-Opt** (Optimizing the Placement of Roadside LiDARs for Autonomous Driving) [[paper](https://arxiv.org/abs/2310.07247)] [~~code~~] [~~project~~]
 809 | - {Real} **DAIR-V2X-C Complemented** (Robust Collaborative 3D Object Detection in Presence of Pose Errors) [[paper](https://arxiv.org/abs/2211.07214)] [[code](https://github.com/yifanlu0227/CoAlign)] [[project](https://siheng-chen.github.io/dataset/dair-v2x-c-complemented)]
 810 | - **V2XP-ASG** (V2XP-ASG: Generating Adversarial Scenes for Vehicle-to-Everything Perception) [[paper](https://arxiv.org/abs/2209.13679)] [[code](https://github.com/XHwind/V2XP-ASG)] [~~project~~]
 811 | - **AutoCastSim** (COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles) [[paper](https://arxiv.org/abs/2205.02222)] [[code](https://github.com/hangqiu/AutoCastSim)] [[project](https://utexas.app.box.com/v/coopernaut-dataset)]
 812 | - **Mono3DVLT-V2X** (Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking) [~~paper~~] [~~code~~] [~~project~~]
 813 | - **RCP-Bench** (RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions) [~~paper~~] [~~code~~] [~~project~~]
 814 | - **TCLF** 方法 (DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection) [[paper](https://arxiv.org/abs/2204.05575)] [[code](https://github.com/AIR-THU/DAIR-V2X)]
 815 | 
 816 | 
 817 | ## :bookmark:Methods <a id="methods-anchor"></a>
 818 | 
 819 | 以下内容来自[[Reference](https://github.com/Little-Podi/Collaborative_Perception?tab=readme-ov-file#bookmarkdataset-and-simulator)]，有待修改。
 820 | 
 821 | Note: {Related} denotes that it is not a pure collaborative perception paper but has related content.
 822 | 
 823 | Note: Sort by time in descending order
 824 | 
 825 | ### Infrastructure - Roadside perception
 826 | - {Related} **BEVHeight** CVPR 2023 (BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection) [[paper](https://arxiv.org/abs/2303.08498)] [[code](https://github.com/ADLab-AutoDrive/BEVHeight)]
 827 | - **Infra-Centric CP** ECCV 2024 (Rethinking the Role of Infrastructure in Collaborative Perception) [[paper](https://arxiv.org/abs/2410.11259)] [~~code~~]
 828 | 
 829 | 
 830 | ### Cooperative perception communication
 831 | - **CPPC**  ICLR 2025   (Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception) [[paper&review](https://openreview.net/forum?id=54XlM8Clkg)] [~~code~~]   [[paper](https://openreview.net/pdf?id=54XlM8Clkg)] 
 832 | - **CoSDH** (CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization) [[paper](https://arxiv.org/abs/2503.03430)] [[code](https://github.com/Xu2729/CoSDH)]
 833 | - **V2VNet** ECCV 2020 (V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction) [[paper](https://arxiv.org/abs/2008.07519)] [[code](https://github.com/DerrickXuNu/OpenCOOD)]
 834 | - **Who2com** ICRA 2020 (Who2com: Collaborative Perception via Learnable Handshake Communication) [[paper](https://arxiv.org/abs/2003.09575)] [[code](https://github.com/GT-RIPL/MultiAgentPerception)]
 835 | - **Robust V2V** CoRL 2020 (Learning to Communicate and Correct Pose Errors) [[paper](https://arxiv.org/abs/2011.05289)] [[code](https://github.com/yifanlu0227/CoAlign)]
 836 | - **When2com**  CVPR 2020  (When2com: Multi-Agent Perception via Communication Graph Grouping) [[paper](https://arxiv.org/abs/2006.00176)] [[code](https://github.com/GT-RIPL/MultiAgentPerception)]
 837 | - **MASH**  IROS 2021  (Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking) [[paper](https://arxiv.org/abs/2107.00771)] [[code](https://github.com/yifanlu0227/CoAlign)]
 838 | - **IA-RCP** IJCAI 2022 (Robust Collaborative Perception against Communication Interruption) [[paper](https://learn-to-race.org/workshop-ai4ad-ijcai2022/papers.html)] [~~code~~]
 839 | - **SyncNet** ECCV 2022 (Latency-Aware Collaborative Perception) [[paper](https://arxiv.org/abs/2207.08560)] [[code](https://github.com/MediaBrain-SJTU/SyncNet)]
 840 | - **MATE** ICRA 2023 (Communication-Critical Planning via Multi-Agent Trajectory Exchange) [[paper](https://arxiv.org/abs/2303.06080)] [~~code~~]
 841 | - **What2comm** MM 2023 (What2comm: Towards Communication-Efficient Collaborative Perception via Feature Decoupling) [[paper](https://dl.acm.org/doi/abs/10.1145/3581783.3611699)] [~~code~~]
 842 | - **How2comm** NeurIPS 2023 (How2comm: Communication-Efficient and Collaboration-Pragmatic Multi-Agent Perception) [[paper&review](https://openreview.net/forum?id=Dbaxm9ujq6)] [[code](https://github.com/ydk122024/How2comm)]
 843 | - **CoBEVFlow** NeurIPS 2023 (Robust Asynchronous Collaborative 3D Detection via Bird's Eye View Flow) [[paper&review](https://openreview.net/forum?id=UHIDdtxmVS)] [[code](https://github.com/MediaBrain-SJTU/CoBEVFlow)]
 844 | - **CodeFilling** CVPR 2024 (Communication-Efficient Collaborative Perception via Information Filling with Codebook) [[paper](https://arxiv.org/abs/2405.04966)] [[code](https://github.com/PhyllisH/CodeFilling)]
 845 | - **ERMVP** CVPR 2024 (ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments) [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_ERMVP_Communication-Efficient_and_Collaboration-Robust_Multi-Vehicle_Perception_in_Challenging_Environments_CVPR_2024_paper.html)] [[code](https://github.com/Terry9a/ERMVP)]
 846 | - **MRCNet** CVPR 2024 (Multi-Agent Collaborative Perception via Motion-Aware Robust Communication Network) [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Hong_Multi-agent_Collaborative_Perception_via_Motion-aware_Robust_Communication_Network_CVPR_2024_paper.html)] [[code](https://github.com/IndigoChildren/collaborative-perception-MRCNet)]
 847 | - **UMC** ICCV 2023 (UMC: A Unified Bandwidth-Efficient and Multi-Resolution Based Collaborative Perception Framework) [[paper](https://arxiv.org/abs/2303.12400)] [[code](https://github.com/ispc-lab/UMC)]
 848 | - **CPPC** ICLR 2025 (Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception) [[paper&review](https://openreview.net/forum?id=54XlM8Clkg)] [~~code~~]
 849 | - **DSRC** AAAI 2025 (DSRC: Learning Density-Insensitive and Semantic-Aware Collaborative Representation against Corruptions) [[paper](https://arxiv.org/abs/2412.10739)] [[code](https://github.com/Terry9a/DSRC)]
 850 | - **CoDynTrust** ICRA 2025 (CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus) [[paper](https://arxiv.org/abs/2502.08169)] [[code](https://github.com/CrazyShout/CoDynTrust)]
 851 | - **Where2comm**  NeurIPS 2022  (Where2comm: Efficient Collaborative Perception via Spatial Confidence Maps) [[paper&review](https://openreview.net/forum?id=dLL4KXzKUpS)] [[code](https://github.com/MediaBrain-SJTU/where2comm)]
 852 | - **CoCMT** (CoCMT: Towards Communication-Efficient Corss-Modal Transformer For Collaborative Perception) [[paper&review](https://openreview.net/forum?id=S1NrbfMS7T)] [[code](https://github.com/taco-group/COCMT)]
 853 | 
 854 | 
 855 | ### Spatial-temporal calibration and pose
 856 | - **SparseAlign** (SparseAlign: A Fully Sparse Framework for Cooperative Object Detection) [[paper](https://arxiv.org/abs/2503.12982)] [~~code~~]
 857 | - **SparseAlign** CVPR 2025 (SparseAlign: A Fully Sparse Framework for Cooperative Object Detection) [~~paper~~] [~~code~~]
 858 | - **FreeAlign**  ICRA 2024 (Robust Collaborative Perception without External Localization and Clock Devices) [[paper](https://arxiv.org/abs/2405.02965)] [[code](https://github.com/MediaBrain-SJTU/FreeAlign)]
 859 | - **CoAlign** ICRA 2023 (Robust Collaborative 3D Object Detection in Presence of Pose Errors) [[paper](https://arxiv.org/abs/2211.07214)] [[code](https://github.com/yifanlu0227/CoAlign)]
 860 | - **MP-Pose**  ICRA 2022  (Multi-Robot Collaborative Perception with Graph Neural Networks) [[paper](https://arxiv.org/abs/2201.01760)] [~~code~~]
 861 | - **FeaCo** MM 2023 (FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions) [[paper](https://dl.acm.org/doi/abs/10.1145/3581783.3611880)] [[code](https://github.com/jmgu0212/FeaCo)]
 862 | 
 863 | ### Multimodal Fusion Method
 864 | - **CoHFF** CVPR 2024 (Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles) [[paper](https://arxiv.org/abs/2402.07635)] [~~code~~]
 865 | - **BM2CP** {BM2CP: Efficient Collaborative Perception with LiDAR-Camera Modalities} [[paper&review](https://openreview.net/forum?id=uJqxFjF1xWp)] [[code](https://github.com/byzhaoAI/BM2CP)]
 866 | - **V2X-R** CVPR 2025 (V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion) [[paper](https://arxiv.org/abs/2411.08402)] [[code](https://github.com/ylwhxht/V2X-R)]
 867 | - **AttFuse** ICRA 2022  (OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication) [[paper](https://arxiv.org/abs/2109.07644)] [[code](https://github.com/DerrickXuNu/OpenCOOD)]
 868 | - **AdaFusion** WACV 2023 (Adaptive Feature Fusion for Cooperative Perception Using LiDAR Point Clouds) [[paper](https://arxiv.org/abs/2208.00116)] [[code](https://github.com/DonghaoQiao/Adaptive-Feature-Fusion-for-Cooperative-Perception)]
 869 | - **FFNet** NeurIPS 2023 (Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection) [[paper&review](https://openreview.net/forum?id=gsglrhvQxX)] [[code](https://github.com/haibao-yu/FFNet-VIC3D)]
 870 | - **TransIFF** ICCV 2023 (TransIFF: An Instance-Level Feature Fusion Framework for Vehicle-Infrastructure Cooperative 3D Detection with Transformers) [[paper](https://openaccess.thecvf.com/content/ICCV2023/html/Chen_TransIFF_An_Instance-Level_Feature_Fusion_Framework_for_Vehicle-Infrastructure_Cooperative_3D_ICCV_2023_paper.html)] [~~code~~]
 871 | - **Co-MTP** ICRA 2025 (Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving) [[paper](https://arxiv.org/abs/2502.16589)] [[code](https://github.com/xiaomiaozhang/Co-MTP)]
 872 | - **ACCO** (Is Discretization Fusion All You Need for Collaborative Perception?) [[paper](https://arxiv.org/abs/2503.13946)] [[code](https://github.com/sidiangongyuan/ACCO)]
 873 | - **CoBEVFusion** 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA) (CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion) [[paper](https://arxiv.org/abs/2310.06008)] [~~code~~]  [[paper](https://ieeexplore.ieee.org/abstract/document/10869530)] 
 874 | 
 875 | ### End-to-end collaborative perception
 876 | - **CoopDETR** ICRA 2025 (CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query) [[paper](https://arxiv.org/abs/2502.19313)] [~~code~~]
 877 | - **UniV2X** AAAI 2025 (End-to-End Autonomous Driving through V2X Cooperation) [[paper](https://arxiv.org/abs/2404.00717)] [[code](https://github.com/AIR-THU/UniV2X)]
 878 | - **Coopernaut** CVPR 2022 (COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles) [[paper](https://arxiv.org/abs/2205.02222)] [[code](https://github.com/UT-Austin-RPL/Coopernaut)]
 879 | - **SCOPE** ICCV 2023 (Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception) [[paper](https://arxiv.org/abs/2307.13929)] [[code](https://github.com/starfdu1418/SCOPE)]
 880 | - **CoDriving** (Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System) [[paper](https://arxiv.org/abs/2404.09496)] [[code](https://github.com/CollaborativePerception/V2Xverse)]
 881 | 
 882 | ### Framework - Hetero-Modal
 883 | - **CoopDETR** ICRA 2025 (CoopDETR: A Unified Cooperative Perception Framework for 3D Detection via Object Query) [[paper](https://arxiv.org/abs/2502.19313)] [~~code~~]
 884 | - **HEAL** ICLR 2024 (An Extensible Framework for Open Heterogeneous Collaborative Perception) [[paper&review](https://openreview.net/forum?id=KkrDUGIASk)] [[code](https://github.com/yifanlu0227/HEAL)]
 885 | - **MACP** WACV 2024 (MACP: Efficient Model Adaptation for Cooperative Perception) [[paper](https://arxiv.org/abs/2310.16870)] [[code](https://github.com/PurdueDigitalTwin/MACP)]
 886 | - **MAMP** ICRA 2023 (Model-Agnostic Multi-Agent Perception Framework) [[paper](https://arxiv.org/abs/2203.13168)] [[code](https://github.com/DerrickXuNu/model_anostic)]
 887 | - **HM-ViT** ICCV 2023 (HM-ViT: Hetero-Modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer) [[paper](https://arxiv.org/abs/2304.10628)] [[code](https://github.com/XHwind/HM-ViT)]
 888 | - **SCOPE** ICCV 2023 (Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception) [[paper](https://arxiv.org/abs/2307.13929)] [[code](https://github.com/starfdu1418/SCOPE)]
 889 | - **Hetecooper** ECCV 2024 (Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception) [[paper](https://eccv.ecva.net/virtual/2024/poster/2467)] [~~code~~]
 890 | - **STAMP** ICLR 2025 (STAMP: Scalable Task- And Model-Agnostic Collaborative Perception) [[paper&review](https://openreview.net/forum?id=8NdNniulYE)] [[code](https://github.com/taco-group/STAMP)]
 891 | 
 892 | ### Transformer related methods
 893 | - **V2XFormer**  AAAI 2024(DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving) [[paper](https://arxiv.org/abs/2304.01168)] [[code](https://github.com/tianqi-wang1996/DeepAccident)]
 894 | - **CoBEVT** CoRL 2022 (CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers) [[paper&review](https://openreview.net/forum?id=PAFEQQtDf8s)] [[code](https://github.com/DerrickXuNu/CoBEVT)]
 895 | - **V2X-ViTv2** V2X-ViTv2: Improved Vision Transformers for Vehicle-to-Everything Cooperative Perception  [[paper]()] [[code]()] [[project]()]  [[doc]()]
 896 | - **V2X-ViT** ECCV 2022 (V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer) [[paper](https://arxiv.org/abs/2203.10638)] [[code](https://github.com/DerrickXuNu/v2x-vit)]
 897 | - **HM-ViT** ICCV 2023 (HM-ViT: Hetero-Modal Vehicle-to-Vehicle Cooperative Perception with Vision Transformer) [[paper](https://arxiv.org/abs/2304.10628)] [[code](https://github.com/XHwind/HM-ViT)]
 898 | - **CoCMT** (CoCMT: Towards Communication-Efficient Corss-Modal Transformer For Collaborative Perception) [[paper&review](https://openreview.net/forum?id=S1NrbfMS7T)] [[code](https://github.com/taco-group/COCMT)]
 899 | 
 900 | ### 3D Detection
 901 | - **R&B-POP** ICLR 2025 (Learning 3D Perception from Others' Predictions) [[paper&review](https://openreview.net/forum?id=Ylk98vWQuQ)] [[code](https://github.com/jinsuyoo/rnb-pop)]
 902 | - **CoCa3D** CVPR 2023 (Collaboration Helps Camera Overtake LiDAR in 3D Detection) [[paper](https://arxiv.org/abs/2303.13560)] [[code](https://github.com/MediaBrain-SJTU/CoCa3D)]
 903 | - **SCOPE** ICCV 2023 (Spatio-Temporal Domain Awareness for Multi-Agent Collaborative Perception) [[paper](https://arxiv.org/abs/2307.13929)] [[code](https://github.com/starfdu1418/SCOPE)]
 904 | - **DI-V2X** NeurIPS 2024  AAAI 2024(DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection) [[paper](https://arxiv.org/abs/2312.15742)] [[code](https://github.com/Serenos/DI-V2X)]
 905 | - **DSRC** AAAI 2025 (DSRC: Learning Density-Insensitive and Semantic-Aware Collaborative Representation against Corruptions) [[paper](https://arxiv.org/abs/2412.10739)] [[code](https://github.com/Terry9a/DSRC)]
 906 | - **CoopDet3D** CVPR 2024 (TUMTraf V2X Cooperative Perception Dataset) [[paper](https://arxiv.org/abs/2403.01316)] [[code](https://github.com/tum-traffic-dataset/coopdet3d)]
 907 | 
 908 | ###  Semantic Segmentation
 909 | - **CoGMP** CVPR 2025 (Generative Map Priors for Collaborative BEV Semantic Segmentation) [~~paper~~] [~~code~~]
 910 | - **CoBEVT** CoRL 2022 (CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers) [[paper&review](https://openreview.net/forum?id=PAFEQQtDf8s)] [[code](https://github.com/DerrickXuNu/CoBEVT)]
 911 | 
 912 | ###  Tracking
 913 | - **FF-Tracking** CVPR 2023 (V2X-Seq: The Large-Scale Sequential Dataset for the Vehicle-Infrastructure Cooperative Perception and Forecasting) [[paper](https://arxiv.org/abs/2305.05938)] [[code](https://github.com/AIR-THU/DAIR-V2X-Seq)]
 914 | - **DMSTrack** ICRA 2024 (Probabilistic 3D Multi-Object Cooperative Tracking for Autonomous Driving via Differentiable Multi-Sensor Kalman Filter) [[paper](https://arxiv.org/abs/2309.14655)] [[code](https://github.com/eddyhkchiu/DMSTrack)]
 915 | - **MOT-CUP** (Collaborative Multi-Object Tracking with Conformal Uncertainty Propagation)  [[paper](https://ieeexplore.ieee.org/abstract/document/10430224)] [[paper](https://arxiv.org/abs/2303.14346)] [[code](https://github.com/susanbao/mot_cup)]
 916 | 
 917 | ###  Forecasting
 918 | - **V2X-Graph** NeurIPS 2024 (Learning Cooperative Trajectory Representations for Motion Forecasting) [[paper](https://arxiv.org/abs/2311.00371)] [[code](https://github.com/AIR-THU/V2X-Graph)]
 919 | - **Co-MTP** ICRA 2025 (Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving) [[paper](https://arxiv.org/abs/2502.16589)] [[code](https://github.com/xiaomiaozhang/Co-MTP)]
 920 | 
 921 | 
 922 | ### Scene Completion
 923 | - **STAR** CoRL 2022 (Multi-Robot Scene Completion: Towards Task-Agnostic Collaborative Perception) [[paper&review](https://openreview.net/forum?id=hW0tcXOJas2)] [[code](https://github.com/coperception/star)]
 924 | 
 925 | ###  Bird's Eye View
 926 | - **CP-Guard**  AAAI 2025  (CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird's Eye View Perception) [[paper](https://arxiv.org/abs/2412.12000)] [~~code~~]
 927 | - **CoBEVFlow** NeurIPS 2023 (Robust Asynchronous Collaborative 3D Detection via Bird's Eye View Flow) [[paper&review](https://openreview.net/forum?id=UHIDdtxmVS)] [[code](https://github.com/MediaBrain-SJTU/CoBEVFlow)]
 928 | - **CoBEVFusion** 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA) (CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion) [[paper](https://arxiv.org/abs/2310.06008)] [~~code~~]  [[paper](https://ieeexplore.ieee.org/abstract/document/10869530)] 
 929 | 
 930 | ### Identification - Selection
 931 | - {Related} **DMGM** ICRA 2023 (Deep Masked Graph Matching for Correspondence Identification in Collaborative Perception) [[paper](https://arxiv.org/abs/2303.07555)] [[code](https://github.com/gaopeng5/DMGM)]
 932 | - **WNT** ICRA 2023 (We Need to Talk: Identifying and Overcoming Communication-Critical Scenarios for Self-Driving) [[paper](https://arxiv.org/abs/2305.04352)] [~~code~~]
 933 | - **CMiMC**  AAAI 2024(What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception) [[paper](https://arxiv.org/abs/2403.10068)] [[code](https://github.com/77SWF/CMiMC)]
 934 | - **Direct-CP** ICRA 2025 (Direct-CP: Directed Collaborative Perception for Connected and Autonomous Vehicles via Proactive Attention) [[paper](https://arxiv.org/abs/2409.08840)] [~~code~~]
 935 | 
 936 | ### Sim2Real 
 937 | - **DUSA** MM 2023 (DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception) [[paper](https://arxiv.org/abs/2310.08117)] [[code](https://github.com/refkxh/DUSA)]
 938 | - **MPDA** ICRA 2023 (Bridging the Domain Gap for Multi-Agent Perception) [[paper](https://arxiv.org/abs/2210.08451)] [[code](https://github.com/DerrickXuNu/MPDA)]
 939 | - **V2X-DG** ICRA 2025 (V2X-DG: Domain Generalization for Vehicle-to-Everything Cooperative Perception) [[paper](https://arxiv.org/abs/2503.15435)] [~~code~~]
 940 | 
 941 | ### LLM
 942 | - **Debrief** 2024 AAAI (Talking Vehicles: Cooperative Driving via Natural Language) [[paper&review](https://openreview.net/forum?id=VYlfoA8I6A)] [~~code~~]
 943 | 
 944 | 
 945 | ### Uncertainty Quantification
 946 | - **Double-M Quantification** ICRA 2023 (Uncertainty Quantification of Collaborative Detection for Self-Driving) [[paper](https://arxiv.org/abs/2209.08162)] [[code](https://github.com/coperception/double-m-quantification)]
 947 | 
 948 | ### 3D Representation
 949 | - {Related} **CO3** ICLR 2023 (CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving) [[paper&review](https://openreview.net/forum?id=QUaDoIdgo0)] [[code](https://github.com/Runjian-Chen/CO3)]
 950 | 
 951 | ###  Reconstruction
 952 | - **CORE** ICCV 2023 (CORE: Cooperative Reconstruction for Multi-Agent Perception) [[paper](https://arxiv.org/abs/2307.11514)] [[code](https://github.com/zllxot/CORE)]
 953 | 
 954 | ### Robustness 
 955 | - **CoDynTrust** ICRA 2025 (CoDynTrust: Robust Asynchronous Collaborative Perception via Dynamic Feature Trust Modulus) [[paper](https://arxiv.org/abs/2502.08169)] [[code](https://github.com/CrazyShout/CoDynTrust)]
 956 | 
 957 | ### Adversarial Attacks
 958 | - **Adversarial V2V**  ICCV 2021 (Adversarial Attacks On Multi-Agent Communication) [[paper](https://arxiv.org/abs/2101.06560)] [~~code~~]
 959 | - **ROBOSAC** ICCV 2023 (Among Us: Adversarially Robust Collaborative Perception by Consensus) [[paper](https://arxiv.org/abs/2303.09495)] [[code](https://github.com/coperception/ROBOSAC)]
 960 | 
 961 | ### Privacy and security
 962 | - **CP-Guard**  AAAI 2025  (CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird's Eye View Perception) [[paper](https://arxiv.org/abs/2412.12000)] [~~code~~]
 963 | 
 964 | 
 965 | ## Selected Preprint
 966 | 
 967 | 
 968 | ### Dec 2025  
 969 | ### Nov 2025  
 970 | ### Oct 2025  
 971 | ### Sep 2025  
 972 | ### Aug 2025  
 973 | ### Jul 2025  
 974 | ### Jun 2025  
 975 | ### May 2025  
 976 | ### Apr 2025  
 977 | ### Mar 2025  
 978 | - **V2X-ReaLO** (V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality) [[paper](https://arxiv.org/abs/2503.10034)] [~~code~~]
 979 | - **CoSDH** (CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization) [[paper](https://arxiv.org/abs/2503.03430)] [[code](https://github.com/Xu2729/CoSDH)]
 980 | - **VIMI** (VIMI: Vehicle-Infrastructure Multi-View Intermediate Fusion for Camera-Based 3D Object Detection) [[paper](https://arxiv.org/abs/2303.10975)] [[code](https://github.com/Bosszhe/VIMI)]
 981 | - **RoCo-Sim** (RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation) [[paper](https://arxiv.org/abs/2503.10410)] [[code](https://github.com/duyuwen-duen/RoCo-Sim)]
 982 | - **CoLMDriver** (CoLMDriver: LLM-Based Negotiation Benefits Cooperative Autonomous Driving) [[paper](https://arxiv.org/abs/2503.08683)] [[code](https://github.com/cxliu0314/CoLMDriver)]
 983 | 
 984 | 
 985 | ### Feb 2025  
 986 | - **V2V-LLM** (V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models) [[paper](https://arxiv.org/abs/2502.09980)] [[code](https://github.com/eddyhkchiu/V2VLLM)]
 987 | - {Related} **TYP** (Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene) [[paper](https://arxiv.org/abs/2502.06682)] [~~code~~]
 988 | - **CP-Guard+** (CP-Guard+: A New Paradigm for Malicious Agent Detection and Defense in Collaborative Perception) [[paper&review](https://openreview.net/forum?id=9MNzHTSDgh)] [~~code~~] [[paper](https://arxiv.org/abs/2502.07807)] 
 989 | - **LCV2I** (LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR) [[paper](https://arxiv.org/abs/2502.17039)] [~~code~~]
 990 | - **CoDiff** (CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection) [[paper](https://arxiv.org/abs/2502.14891)] [~~code~~]
 991 | 
 992 | ### Jan 2025  
 993 | - **mmCooper** (mmCooper: A Multi-Agent Multi-Stage Communication-Efficient and Collaboration-Robust Cooperative Perception Framework) [[paper](https://arxiv.org/abs/2501.12263)] [~~code~~]
 994 | - **V2X-DGPE** (V2X-DGPE: Addressing Domain Gaps and Pose Errors for Robust Collaborative 3D Object Detection) [[paper](https://arxiv.org/abs/2501.02363)] [[code](https://github.com/wangsch10/V2X-DGPE)]
 995 | - **RG-Attn** (RG-Attn: Radian Glue Attention for Multi-Modality Multi-Agent Cooperative Perception) [[paper](https://arxiv.org/abs/2501.16803)] [~~code~~]
 996 | - **I2XTraj** (Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything) [[paper](https://arxiv.org/abs/2501.13461)] [~~code~~]
 997 | 
 998 | ### Dec 2024  
 999 | - **V2XPnP** (V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction) [[paper](https://arxiv.org/abs/2412.01812)] [[code](https://github.com/Zewei-Zhou/V2XPnP)]
1000 | 
1001 | ### Nov 2024  
1002 | ### Oct 2024  
1003 | ### Sep 2024  
1004 | - **RopeBEV** (RopeBEV: A Multi-Camera Roadside Perception Network in Bird’s-Eye-View)   [[paper](https://arxiv.org/abs/2409.11706)] [[code]()] [[project]()]  [[doc]()]
1005 | - **CoMamba** (CoMamba: Real-Time Cooperative Perception Unlocked with State Space Models) [[paper](https://arxiv.org/abs/2409.10699)] [~~code~~]
1006 | - **CollaMamba** (CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model) [[paper](https://arxiv.org/abs/2409.07714)] [~~code~~]
1007 | - **LMMCoDrive** (LMMCoDrive: Cooperative Driving with Large Multimodal Model) [[paper](https://arxiv.org/abs/2409.11981)] [[code](https://github.com/henryhcliu/LMMCoDrive)]
1008 | - **CoDrivingLLM** (Towards Interactive and Learnable Cooperative Driving Automation: A Large Language Model-Driven Decision-making Framework) [[paper](https://arxiv.org/abs/2409.12812)] [[code](https://github.com/FanGShiYuu/CoDrivingLLM)]
1009 | - **DiffCP** (DiffCP: Ultra-Low Bit Collaborative Perception via Diffusion Model) [[paper](https://arxiv.org/abs/2409.19592)] [~~code~~]
1010 | 
1011 | ### Aug 2024  
1012 | - **CooPre** (CooPre: Cooperative Pretraining for V2X Cooperative Perception) [[paper](https://arxiv.org/abs/2408.11241)]
1013 | - **CTCE** (Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception) [[paper](https://arxiv.org/abs/2408.10531)] [~~code~~]
1014 | 
1015 | ### Jul 2024  
1016 | - **V2X-M2C** (V2X-M2C: Efficient Multi-Module Collaborative Perception with Two Connections) [[paper](https://arxiv.org/abs/2407.11546)] [~~code~~]
1017 | Abstract: In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a collaborative perception model V2X-M2C, consisting of multiple modules, each generating inter-agent complementary information, spatial global context, and spatial local information. Inspired by the question of why most existing architectures are sequential, we analyze both the sequential and parallel connections of the modules. The sequential connection synergizes the modules, whereas the parallel connection independently improves each module. Extensive experiments demonstrate that V2X-M2C achieves state-of-the-art perception performance, increasing the detection accuracy by 8.00% to 10.87% and decreasing the FLOPs by 42.81% to 52.64%.
1018 | - **ParCon** (ParCon: Noise-Robust Collaborative Perception via Multi-Module Parallel Connection) [[paper](https://arxiv.org/abs/2407.11546)] [~~code~~]
1019 | 
1020 | ### Jun 2024  
1021 | - **CoBEVGlue** (Self-Localized Collaborative Perception) [[paper](https://arxiv.org/abs/2406.12712)] [[code](https://github.com/VincentNi0107/CoBEVGlue)]
1022 | 
1023 | ### May 2024  
1024 | - **RCDN** (RCDN: Towards Robust Camera-Insensitivity Collaborative Perception via Dynamic Feature-Based 3D Neural Modeling) [[paper](https://arxiv.org/abs/2405.16868)] [~~code~~]
1025 | 
1026 | ### Apr 2024  
1027 | ### Mar 2024  
1028 | - **V2X-PC** (V2X-PC: Vehicle-to-Everything Collaborative Perception via Point Cluster) [[paper](https://arxiv.org/abs/2403.16635)] [~~code~~]
1029 | - **V2X-DGW** (V2X-DGW: Domain Generalization for Multi-Agent Perception under Adverse Weather Conditions) [[paper](https://arxiv.org/abs/2403.11371)] [~~code~~]
1030 | - **CMP** (CMP: Cooperative Motion Prediction with Multi-Agent Communication) [[paper](https://arxiv.org/abs/2403.17916)] [~~code~~]
1031 | 
1032 | ### Feb 2024  
1033 | ### Jan 2024  
1034 | - **PragComm** (Pragmatic Communication in Multi-Agent Collaborative Perception) [[paper](https://arxiv.org/abs/2401.12694)] [[code](https://github.com/PhyllisH/PragComm)]
1035 | 
1036 | ### Dec 2023  
1037 | - **SiCP** (SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles) [[paper](https://arxiv.org/abs/2312.04822)] [[code](https://github.com/DarrenQu/SiCP)]
1038 | 
1039 | ### Nov 2023  
1040 | ### Oct 2023  
1041 | - **AR2VP** (Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision) [[paper](https://arxiv.org/abs/2310.19113)] [[code](https://github.com/tjy1423317192/AP2VP)]
1042 | 
1043 | ### Sep 2023  
1044 | ### Aug 2023  
1045 | - **QUEST** (QUEST: Query Stream for Vehicle-Infrastructure Cooperative Perception) [[paper](https://arxiv.org/abs/2308.01804)] [~~code~~]
1046 | 
1047 | ### Jul 2023  
1048 | ### Jun 2023  
1049 | ### May 2023  
1050 | ### Apr 2023  
1051 | ### Mar 2023  
1052 | ### Feb 2023  
1053 | ### Jan 2023  
1054 | 
1055 | ## :bookmark:Citation  <a id="citation-anchor"></a>
1056 | 
1057 | Our survey paper is at[Collaborative Perception Datasets for Autonomous Driving: A Review](https://arxiv.org/abs/2504.12696) which includes more detailed discussions and will be continuously updated. The latest version was updated on April 16, 2025.
1058 | 
1059 | If you find our repository helpful, please consider citing our work:
1060 | 
1061 | > Wang, Naibang, et al. **"Collaborative Perception Datasets for Autonomous Driving: A Review."** arXiv preprint arXiv:2504.12696, 2025.
1062 | ```BibTeX
1063 | @article{wang2025collaborative,
1064 |   title={Collaborative Perception Datasets for Autonomous Driving: A Review},
1065 |   author={Wang, Naibang and Shang, Deyong and Gong, Yan and Hu, Xiaoxi and Song, Ziying and Yang, Lei and Huang, Yuhan and Wang, Xiaoyu and Lu, Jianli},
1066 |   journal={arXiv preprint arXiv:2504.12696},
1067 |   year={2025}
1068 | }
1069 | ```
1070 | 
1071 | - 其他=================
1072 | 
1073 | 
1074 | - {Related} **LAV** (Learning from All Vehicles) [[paper](https://arxiv.org/abs/2203.11934)] [[code](https://github.com/dotchen/LAV)]
1075 | 
1076 | - **PolyInter** CVPR 2025 (One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception) [[paper](https://arxiv.org/abs/2411.16799)] [~~code~~]
1077 | 
1078 | - **CRCNet**  MM 2022  (Complementarity-Enhanced and Redundancy-Minimized Collaboration Network for Multi-agent Perception) [[paper](https://dl.acm.org/doi/abs/10.1145/3503161.3548197)] [~~code~~]
1079 | 
1080 | - **DiscoNet** NeurIPS 2021 (Learning Distilled Collaboration Graph for Multi-Agent Perception) [[paper&review](https://openreview.net/forum?id=ZRcjSOmYraB)] [[code](https://github.com/ai4ce/DiscoNet)]
1081 | 
1082 | - **DSDNet** ECCV 2020 (DSDNet: Deep Structured Self-Driving Network) [[paper](https://arxiv.org/abs/2008.06041)] [~~code~~]
1083 | 
1084 | - **MAIN** ICRA 2020 (Enhancing Multi-Robot Perception via Learned Data Association) [[paper](https://arxiv.org/abs/2107.00769)] [~~code~~]
1085 | 
1086 | ## Reference
1087 | 
1088 | [[Collaborative Perception](https://github.com/Little-Podi/Collaborative_Perception?tab=readme-ov-file#bookmarkdataset-and-simulator)]
1089 | 
1090 | - 其他
1091 |   
1092 |  [~~paper~~]  [~~code~~] [~~project~~] [[paper]()] [[code]()] [[project]()]  [[doc]()]
1093 | 
1094 | 


--------------------------------------------------------------------------------