├── RCBEVDet.JPG
├── rcbevdet-master.zip
├── RCBEVDet Application.docx
└── README.md


/RCBEVDet.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VDIGPKU/RCBEVDet/HEAD/RCBEVDet.JPG


--------------------------------------------------------------------------------
/rcbevdet-master.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VDIGPKU/RCBEVDet/HEAD/rcbevdet-master.zip


--------------------------------------------------------------------------------
/RCBEVDet Application.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/VDIGPKU/RCBEVDet/HEAD/RCBEVDet Application.docx


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # RCBEVDet
 2 | 
 3 | This is the official implementation of CVPR2024 paper: [**RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection**](https://arxiv.org/abs/2403.16440) and its extended version RCBEVDet++.
 4 | 
 5 | ~~**Note: please sign the [application](https://github.com/VDIGPKU/RCBEVDet/blob/main/RCBEVDet%20Application.docx) to obtain the code** **of RCBEVDet.**~~
 6 | 
 7 | 
 8 | 
 9 | ## Introduction
10 | 
11 | We present RCBEVDet, a radar-camera fusion 3D object detection method in the bird's eye view (BEV). Specifically, we first design RadarBEVNet for radar BEV feature extraction. RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RCS) aware BEV encoder. In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to extract radar features, with an injection and extraction module to facilitate communication between the two encoders. The RCS-aware BEV encoder takes RCS as the object size prior to scattering the point feature in BEV. Besides, we present the Cross-Attention Multi-layer Fusion module to automatically align the multi-modal BEV feature from radar and camera with the deformable attention mechanism, and then fuse the feature with channel and spatial fusion layers. Experimental results show that RCBEVDet achieves new state-of-the-art radar-camera fusion results on nuScenes and view-of-delft (VoD) 3D object detection benchmarks. Furthermore, RCBEVDet achieves better 3D detection results than all real-time camera-only and radar-camera 3D object detectors with a faster inference speed at 21~28 FPS. 
12 | 
13 | ![RCBEVDet](RCBEVDet.JPG)
14 | 
15 | ## Update
16 | 
17 | * 2024/06/28 - RCBEVDet++ achieves SOTA 3D object detection, BEV semantic segmentation, and 3D multi-object tracking results on nuScenes benchmark. **The paper and code for RCBEVDet++ is coming soon~**
18 | * 2024/06/01 - Code for RCBEVDet is released in the zip file.
19 | 
20 | 
21 | 
22 | ## Weight & Code
23 | 
24 | * Model weights for RCBEVDet are released: [google drive](https://drive.google.com/drive/folders/1VhOBcJ7wT71R8Dqyr5MlQUKv7lVcjfrz?usp=sharing)
25 | 
26 | 
27 | 
28 | ## Results
29 | 
30 | ##### 3D Object Detection (nuScenes Validation)
31 | 
32 | | Method     | Input | Backbone  | NDS  | mAP  |
33 | | :--------- | ----- | --------- | ---- | :--- |
34 | | BEVDepth4D | C     | ResNet-50 | 51.9 | 40.5 |
35 | | RCBEVDet   | C+R   | ResNet-50 | 56.8 | 45.3 |
36 | | SparseBEV  | C     | ResNet-50 | 54.5 | 43.2 |
37 | | RCBEVDet++ | C+R   | ResNet-50 | 60.4 | 51.9 |
38 | 
39 | ##### 3D Object Detection (nuScenes Test)
40 | 
41 | | Method     | Input | Backbone | Future frame | NDS  | mAP  |
42 | | :--------- | ----- | -------- | ------------ | ---- | :--- |
43 | | BEVDepth4D | C     | V2-99    | No           | 60.5 | 51.5 |
44 | | RCBEVDet   | C+R   | V2-99    | No           | 63.9 | 55.0 |
45 | | SparseBEV  | C     | V2-99    | No           | 63.6 | 55.6 |
46 | | RCBEVDet++ | C+R   | V2-99    | No           | 68.7 | 62.6 |
47 | | SparseBEV  | C     | ViT-L    | Yes          | 70.2 | ——   |
48 | | RCBEVDet++ | C+R   | ViT-L    | Yes          | 72.7 | 67.3 |
49 | 
50 | ##### BEV Semantic Segmentation (nuScenes Validation)
51 | 
52 | | Method     | Input | Backbone   | mIoU |
53 | | :--------- | ----- | ---------- | ---- |
54 | | RCBEVDet++ | C+R   | ResNet-101 | 62.8 |
55 | 
56 | ##### 3D Multi-object Tracking (nuScenes Test)
57 | 
58 | | Method     | Input | Backbone | AMOTA | AMOTP |
59 | | :--------- | ----- | -------- | ----- | ----- |
60 | | RCBEVDet++ | C+R   | ViT-L    | 59.6  | 0.713 |
61 | 
62 | 
63 | 
64 | ## Acknowledgements
65 | 
66 | The overall code are based on [mmdetection3D](https://github.com/open-mmlab/mmdetection3d), [BEVDet](https://github.com/HuangJunJie2017/BEVDet) and [SparseBEV](https://github.com/MCG-NJU/SparseBEV/tree/main). We sincerely thank the authors for their great work.
67 | 
68 | 
69 | 
70 | ## License
71 | 
72 | The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.
73 | 
74 | 


--------------------------------------------------------------------------------