PDF (10.1 MB)
Collect
Submit Manuscript
Show Outline
Outline
Abstract
References
Show full outline
Hide outline
Research Article | Open Access

Camera–Radar Fusion with Modality Interaction and Radar Gaussian Expansion for 3D Object Detection

Xiang Liu1Zhenglin Li1,2()Yang Zhou1Yan Peng1,2Jun Luo1,3
Institute of Artificial Intelligence, Shanghai University, Shanghai, China
School of Future Technology, Shanghai University, Shanghai, China
State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing, China
Show Author Information

Abstract

The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional (3D) object detection. Most existing methods extract features from each modality separately and conduct fusion with specifically designed modules, potentially resulting in information loss during modality transformation. To address this issue, we propose a novel framework for 3D object detection that iteratively updates radar and camera features through an interaction module. This module serves a dual purpose by facilitating the fusion of multi-modal data while preserving the original features. Specifically, radar and image features are sampled and aggregated with a set of sparse 3D object queries, while retaining the integrity of the original radar features to prevent information loss. Additionally, an innovative radar augmentation technique named Radar Gaussian Expansion is proposed. This module allocates radar measurements within each voxel to neighboring ones as a Gaussian distribution, reducing association errors during projection and enhancing detection accuracy. Our proposed framework offers a comprehensive solution to the fusion of radar and camera data, ultimately leading to heightened accuracy and completeness in 3D object detection processes. On the nuScenes test benchmark, our camera–radar fusion method achieves state-of-the-art 3D object detection results with a 41.6% mean average precision and 52.5% nuScenes detection score.

References

1
Hung W-C, Kretzschmar H, Casser V, Hwang J-J, Anguelov D, LET-3D-AP: Longitudinal error tolerant 3d average precision for camera-only 3d detection. arXiv. 2022. https://arxiv.org/abs/2206.07705
2
Ma X, Zhang Y, Xu D, Zhou D, Yi S, Li H, Ouyang W. Delving into localization errors for monocular 3D object detection. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021 Jun 20–25; Nashville, TN.
3
Bijelic M, Gruber T, Mannan F, Kraus F, Ritter W, Dietmayer K, Heide F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020 Jun 13–19; Seattle, WA.
4

Zhou Y, Liu L, Zhao H, López-Benítez M, Yu L, Yue Y. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. Sensors. 2022;22(11):4208.

5
Cheng Y, Xu H, Liu Y. Robust small object detection on the water surface through fusion of camera and millimeter wave radar. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021 Oct 10–17; Montreal, QC, Canada.
6
Nabati R, Qi H. CenterFusion: Center-based radar and camera fusion for 3D object detection. Paper presented at: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2021 Jan 3–8; Waikoloa, HI.
7
Stäcker L, Mishra S, Heidenreich P, Rambach J, Stricker D. Rc-bevfusion: A plug-in module for radar-camera bird’s eye view feature fusion. arXiv. 2023. https://arxiv.org/abs/2305.15883
8
Lei K, Chen Z, Jia S, Zhang X. Hvdetfusion: A simple and robust camera-radar fusion framework. arXiv. 2023. https://arxiv.org/abs/2307.11323
9
Kim J, Seong M, Bang G, Kum D, Choi JW. RCM-fusion: Radar-camera multi-level fusion for 3D object detection. arXiv. 2023. https://arxiv.org/abs/2307.10249
10
Liu Z, Tang H, Amini A, Yang X, Mao H, Rus DL, Han S. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. Paper presented at: 2023 IEEE International Conference on Robotics and Automation (ICRA); 2023 May 29–Jun 02; London, UK.
11
Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019 Oct 27–Nov 02; Seoul, South Korea.
12
Wang T, Zhu X, Pang J, Lin D. FCOS3D: Fully convolutional one-stage monocular 3D object detection. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021 Oct 11–17; Montreal, BC, Canada.
13
Wang T, Xinge Z, Pang J, Lin D. Probabilistic and geometric depth: Detecting objects in perspective. arXiv. 2023. https://arxiv.org/abs/2107.14160
14

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. Adv Neural Inf Proces Syst. 2017(30):5998–6008.

15
Wang Y, Guizilini V, Zhang T, Wang Y, Zhao H, Solomon J. DETR3D: 3D object detection from multi-view images via 3D-to-2D Queries. arXiv. 2021. https://arxiv.org/abs/2110.06922
16
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: European conference on computer vision. Glasgow (UK): Springer; 2020. p. 213–229.
17
Liu Y, Wang T, Zhang X, Sun J. PETR: Position embedding transformation for multi-view 3D object detection. In: European conference on computer vision. Springer; 2022. p 531–548.
18
Huang J, Huang G, Zhu Z, Ye Y, Du D. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view. arXiv. 2021. https://arxiv.org/abs/2112.11790
19
Huang J, Huang G. Bevdet4d: Exploit temporal cues in multi-camera 3D object detection. arXiv. 2022. https://arxiv.org/abs/2203.17054
20
Chadwick S, Maddern W, Newman P. Distant vehicle detection using radar and vision. Paper presented at: 2019 International Conference on Robotics and Automation (ICRA); 2019 May 20–24; Montreal, QC, Canada.
21
Nobis F, Geisslinger M, Weber M, Betz J, Lienkamp M. A deep learning-based radar and camera sensor fusion architecture for object detection. In: 2019 Sensor Data Fusion: Trends Solutions: Applications (SDF). Bonn (Germany): IEEE; 2019. p. 1–7.
22
Li L-Q, Xie Y-L. A feature pyramid fusion detection algorithm based on radar and camera sensor. Paper presented at: 2020 15th IEEE International Conference on Signal Processing (ICSP); 2020 Dec 6–9; Beijing, China.
23
Yadav R, Vierling A, Berns K. Radar + RGB fusion for robust object detection in autonomous vehicle. Paper presented at: 2020 IEEE International Conference on Image Processing (ICIP); 2020 Oct 25–28; Dhabi, UAE.
24
Nabati R, Qi H. RRPN: Radar region proposal network for object detection in autonomous vehicles. Paper presented at: 2019 IEEE International Conference on Image Processing (ICIP); 2019 Sep 22–25; Taipei, Taiwan.
25
Nabati R, Qi H. Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles. arXiv. 2020. https://arxiv.org/abs/2009.08428
26
Bansal K, Rungta K, Bharadia D. Radsegnet: A reliable approach to radar camera fusion. arXiv. 2022. https://arxiv.org/abs/2208.03849
27

Liang T, Xie H, Yu K, Xia Z, Lin Z, Wang Y, Tang T, Wang B, Tang Z. BEVFusion: A simple and robust lidar-camera fusion framework. Adv Neural Inf Proces Syst. 2022(35):10421–10434.

28
Li Y, Bao H, Ge Z, Yang J, Sun J, Li Z. BEVStereo: Enhancing depth estimation in multi-view 3D object detection with dynamic temporal stereo. arXiv. 2022. https://arxiv.org/abs/2209.10248
29
Li Y, Ge Z, Yu G, Yang J, Wang Z, Shi Y, Sun J, Li Z. BEVDepth: Acquisition of reliable depth for multi-view 3D object detection.arXiv. 2023. https://arxiv.org/abs/2206.10092
30
Zhou H, Ge Z, Li Z, Zhang X. MatrixVT: Efficient multi-camera to BEV transformation for 3D perception. arXiv. 2020. https://arxiv.org/abs/2211.10593
31
Kim Y, Kim S, Shin J, Choi JW, Kum D. CRN: Camera radar net for accurate, robust, efficient 3D perception. arXiv. 2023. https://arxiv.org/abs/2304.00670
32
Pang S, Morris D, Radha H. TransCAR: Transformer-based camera-and-radar fusion for 3D object detection. arXiv. 2023. https://arxiv.org/abs/2305.00397
33
Chen X, Zhang T, Wang Y, Wang Y, Zhao H. FUTR3D: A unified sensor fusion framework for 3D detection. Paper presented at: 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada.
34

Kim Y, Kim S, Choi JW, Kum D. CRAFT: camera-radar 3D object detection with spatio-contextual fusion transformer. Proc AAAI Conf Artif Intell. 2023;37(1):1160–1168.

35
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27–30; Las Vegas, NV.
36
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21–26; Honolulu, HI.
37
Yang Z, Chen J, Miao Z, Li W, Zhu X, Zhang L. Deepinteraction: 3D object detection via modality interaction. arXiv. 2022. https://arxiv.org/abs/2208.11112
38
Ku J, Harakeh A, Waslander SL. In defense of classical image processing: Fast depth completion on the CPU. In: 15th Conference on Computer and Robot Vision (CRV). Toronto (Canada): IEEE; 2018. p. 16–22; 2018
39

Kuhn HW. The Hungarian method for the assignment problem. Nav Res Logist Q. 1955;2(1–2):83–97.

40

Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):318–327.

41
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O. nuScenes: A multimodal dataset for autonomous driving. Paper presented at: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020; Seattle, WA.
42
Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv. 2017. https://arxiv.org/abs/1711.05101
43
Jiang Y, Zhang L, Miao Z, Zhu X, Gao J, Hu W, Jiang Y-G. PolarFormer: Multi-camera 3D object detection with polar transformer. Paper presented at: Proceedings of the AAAI conference on Artificial Intelligence. 2023; Washington, D.C.
44
Long Y, Kumar A, Morris D, Liu X, Castro M, Chakravarty P. RADIANT: Radar-image association network for 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. Washington, D.C.; 2023. p. 1808–1816.
Cyborg and Bionic Systems
Article number: 0079
Cite this article:
Liu X, Li Z, Zhou Y, et al. Camera–Radar Fusion with Modality Interaction and Radar Gaussian Expansion for 3D Object Detection. Cyborg and Bionic Systems, 2024, 5: 0079. https://doi.org/10.34133/cbsystems.0079
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return