| Sign up

PDF (10.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

References

Show full outline

Hide outline

Research Article | Open Access

Camera–Radar Fusion with Modality Interaction and Radar Gaussian Expansion for 3D Object Detection

Xiang Liu^¹, Zhenglin Li^{¹^,²}(), Yang Zhou^¹, Yan Peng^{¹^,²}, Jun Luo^{¹^,³}

1

Institute of Artificial Intelligence, Shanghai University, Shanghai, China

2

School of Future Technology, Shanghai University, Shanghai, China

3

State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing, China

Show Author Information

Abstract

The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional (3D) object detection. Most existing methods extract features from each modality separately and conduct fusion with specifically designed modules, potentially resulting in information loss during modality transformation. To address this issue, we propose a novel framework for 3D object detection that iteratively updates radar and camera features through an interaction module. This module serves a dual purpose by facilitating the fusion of multi-modal data while preserving the original features. Specifically, radar and image features are sampled and aggregated with a set of sparse 3D object queries, while retaining the integrity of the original radar features to prevent information loss. Additionally, an innovative radar augmentation technique named Radar Gaussian Expansion is proposed. This module allocates radar measurements within each voxel to neighboring ones as a Gaussian distribution, reducing association errors during projection and enhancing detection accuracy. Our proposed framework offers a comprehensive solution to the fusion of radar and camera data, ultimately leading to heightened accuracy and completeness in 3D object detection processes. On the nuScenes test benchmark, our camera–radar fusion method achieves state-of-the-art 3D object detection results with a 41.6% mean average precision and 52.5% nuScenes detection score.

References

1

Hung W-C, Kretzschmar H, Casser V, Hwang J-J, Anguelov D, LET-3D-AP: Longitudinal error tolerant 3d average precision for camera-only 3d detection. arXiv. 2022. https://arxiv.org/abs/2206.07705

2

Ma X, Zhang Y, Xu D, Zhou D, Yi S, Li H, Ouyang W. Delving into localization errors for monocular 3D object detection. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021 Jun 20–25; Nashville, TN.

3

Bijelic M, Gruber T, Mannan F, Kraus F, Ritter W, Dietmayer K, Heide F. Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather. Paper presented at: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020 Jun 13–19; Seattle, WA.

4

Zhou Y, Liu L, Zhao H, López-Benítez M, Yu L, Yue Y. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. Sensors. 2022;22(11):4208.

Crossref Google Scholar

5

Cheng Y, Xu H, Liu Y. Robust small object detection on the water surface through fusion of camera and millimeter wave radar. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021 Oct 10–17; Montreal, QC, Canada.

6

Nabati R, Qi H. CenterFusion: Center-based radar and camera fusion for 3D object detection. Paper presented at: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2021 Jan 3–8; Waikoloa, HI.

7

Stäcker L, Mishra S, Heidenreich P, Rambach J, Stricker D. Rc-bevfusion: A plug-in module for radar-camera bird’s eye view feature fusion. arXiv. 2023. https://arxiv.org/abs/2305.15883

8

Lei K, Chen Z, Jia S, Zhang X. Hvdetfusion: A simple and robust camera-radar fusion framework. arXiv. 2023. https://arxiv.org/abs/2307.11323

9

Kim J, Seong M, Bang G, Kum D, Choi JW. RCM-fusion: Radar-camera multi-level fusion for 3D object detection. arXiv. 2023. https://arxiv.org/abs/2307.10249

10

Liu Z, Tang H, Amini A, Yang X, Mao H, Rus DL, Han S. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. Paper presented at: 2023 IEEE International Conference on Robotics and Automation (ICRA); 2023 May 29–Jun 02; London, UK.

11

Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019 Oct 27–Nov 02; Seoul, South Korea.

12

Wang T, Zhu X, Pang J, Lin D. FCOS3D: Fully convolutional one-stage monocular 3D object detection. Paper presented at: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021 Oct 11–17; Montreal, BC, Canada.

13

Wang T, Xinge Z, Pang J, Lin D. Probabilistic and geometric depth: Detecting objects in perspective. arXiv. 2023. https://arxiv.org/abs/2107.14160

14

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. Adv Neural Inf Proces Syst. 2017(30):5998–6008.

15

Wang Y, Guizilini V, Zhang T, Wang Y, Zhao H, Solomon J. DETR3D: 3D object detection from multi-view images via 3D-to-2D Queries. arXiv. 2021. https://arxiv.org/abs/2110.06922

16

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: European conference on computer vision. Glasgow (UK): Springer; 2020. p. 213–229.

17

Liu Y, Wang T, Zhang X, Sun J. PETR: Position embedding transformation for multi-view 3D object detection. In: European conference on computer vision. Springer; 2022. p 531–548.

18

Huang J, Huang G, Zhu Z, Ye Y, Du D. BEVDet: High-performance multi-camera 3D object detection in bird-eye-view. arXiv. 2021. https://arxiv.org/abs/2112.11790

19

Huang J, Huang G. Bevdet4d: Exploit temporal cues in multi-camera 3D object detection. arXiv. 2022. https://arxiv.org/abs/2203.17054

20

Chadwick S, Maddern W, Newman P. Distant vehicle detection using radar and vision. Paper presented at: 2019 International Conference on Robotics and Automation (ICRA); 2019 May 20–24; Montreal, QC, Canada.

21

Nobis F, Geisslinger M, Weber M, Betz J, Lienkamp M. A deep learning-based radar and camera sensor fusion architecture for object detection. In: 2019 Sensor Data Fusion: Trends Solutions: Applications (SDF). Bonn (Germany): IEEE; 2019. p. 1–7.

22

Li L-Q, Xie Y-L. A feature pyramid fusion detection algorithm based on radar and camera sensor. Paper presented at: 2020 15th IEEE International Conference on Signal Processing (ICSP); 2020 Dec 6–9; Beijing, China.

23

Yadav R, Vierling A, Berns K. Radar + RGB fusion for robust object detection in autonomous vehicle. Paper presented at: 2020 IEEE International Conference on Image Processing (ICIP); 2020 Oct 25–28; Dhabi, UAE.

24

Nabati R, Qi H. RRPN: Radar region proposal network for object detection in autonomous vehicles. Paper presented at: 2019 IEEE International Conference on Image Processing (ICIP); 2019 Sep 22–25; Taipei, Taiwan.

25

Nabati R, Qi H. Radar-camera sensor fusion for joint object detection and distance estimation in autonomous vehicles. arXiv. 2020. https://arxiv.org/abs/2009.08428

26

Bansal K, Rungta K, Bharadia D. Radsegnet: A reliable approach to radar camera fusion. arXiv. 2022. https://arxiv.org/abs/2208.03849

27

Liang T, Xie H, Yu K, Xia Z, Lin Z, Wang Y, Tang T, Wang B, Tang Z. BEVFusion: A simple and robust lidar-camera fusion framework. Adv Neural Inf Proces Syst. 2022(35):10421–10434.

28

Li Y, Bao H, Ge Z, Yang J, Sun J, Li Z. BEVStereo: Enhancing depth estimation in multi-view 3D object detection with dynamic temporal stereo. arXiv. 2022. https://arxiv.org/abs/2209.10248

29

Li Y, Ge Z, Yu G, Yang J, Wang Z, Shi Y, Sun J, Li Z. BEVDepth: Acquisition of reliable depth for multi-view 3D object detection.arXiv. 2023. https://arxiv.org/abs/2206.10092

30

Zhou H, Ge Z, Li Z, Zhang X. MatrixVT: Efficient multi-camera to BEV transformation for 3D perception. arXiv. 2020. https://arxiv.org/abs/2211.10593

31

Kim Y, Kim S, Shin J, Choi JW, Kum D. CRN: Camera radar net for accurate, robust, efficient 3D perception. arXiv. 2023. https://arxiv.org/abs/2304.00670

32

Pang S, Morris D, Radha H. TransCAR: Transformer-based camera-and-radar fusion for 3D object detection. arXiv. 2023. https://arxiv.org/abs/2305.00397

33

Chen X, Zhang T, Wang Y, Wang Y, Zhao H. FUTR3D: A unified sensor fusion framework for 3D detection. Paper presented at: 2023 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada.

34

Kim Y, Kim S, Choi JW, Kum D. CRAFT: camera-radar 3D object detection with spatio-contextual fusion transformer. Proc AAAI Conf Artif Intell. 2023;37(1):1160–1168.

Crossref Google Scholar

35

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27–30; Las Vegas, NV.

36

Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21–26; Honolulu, HI.

37

Yang Z, Chen J, Miao Z, Li W, Zhu X, Zhang L. Deepinteraction: 3D object detection via modality interaction. arXiv. 2022. https://arxiv.org/abs/2208.11112

38

Ku J, Harakeh A, Waslander SL. In defense of classical image processing: Fast depth completion on the CPU. In: 15th Conference on Computer and Robot Vision (CRV). Toronto (Canada): IEEE; 2018. p. 16–22; 2018

39

Kuhn HW. The Hungarian method for the assignment problem. Nav Res Logist Q. 1955;2(1–2):83–97.

Crossref Google Scholar

40

Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):318–327.

Crossref Google Scholar

41

Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O. nuScenes: A multimodal dataset for autonomous driving. Paper presented at: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020; Seattle, WA.

42

Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv. 2017. https://arxiv.org/abs/1711.05101

43

Jiang Y, Zhang L, Miao Z, Zhu X, Gao J, Hu W, Jiang Y-G. PolarFormer: Multi-camera 3D object detection with polar transformer. Paper presented at: Proceedings of the AAAI conference on Artificial Intelligence. 2023; Washington, D.C.

44

Long Y, Kumar A, Morris D, Liu X, Castro M, Chakravarty P. RADIANT: Radar-image association network for 3D object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. Washington, D.C.; 2023. p. 1808–1816.

Cyborg and Bionic Systems

Article number: 0079

DOI: 10.34133/cbsystems.0079

Cite this article:

Liu X, Li Z, Zhou Y, et al. Camera–Radar Fusion with Modality Interaction and Radar Gaussian Expansion for 3D Object Detection. Cyborg and Bionic Systems, 2024, 5: 0079. https://doi.org/10.34133/cbsystems.0079

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号