The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional (3D) object detection. Most existing methods extract features from each modality separately and conduct fusion with specifically designed modules, potentially resulting in information loss during modality transformation. To address this issue, we propose a novel framework for 3D object detection that iteratively updates radar and camera features through an interaction module. This module serves a dual purpose by facilitating the fusion of multi-modal data while preserving the original features. Specifically, radar and image features are sampled and aggregated with a set of sparse 3D object queries, while retaining the integrity of the original radar features to prevent information loss. Additionally, an innovative radar augmentation technique named Radar Gaussian Expansion is proposed. This module allocates radar measurements within each voxel to neighboring ones as a Gaussian distribution, reducing association errors during projection and enhancing detection accuracy. Our proposed framework offers a comprehensive solution to the fusion of radar and camera data, ultimately leading to heightened accuracy and completeness in 3D object detection processes. On the nuScenes test benchmark, our camera–radar fusion method achieves state-of-the-art 3D object detection results with a 41.6% mean average precision and 52.5% nuScenes detection score.
Zhou Y, Liu L, Zhao H, López-Benítez M, Yu L, Yue Y. Towards deep radar perception for autonomous driving: Datasets, methods, and challenges. Sensors. 2022;22(11):4208.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. Adv Neural Inf Proces Syst. 2017(30):5998–6008.
Liang T, Xie H, Yu K, Xia Z, Lin Z, Wang Y, Tang T, Wang B, Tang Z. BEVFusion: A simple and robust lidar-camera fusion framework. Adv Neural Inf Proces Syst. 2022(35):10421–10434.
Kim Y, Kim S, Choi JW, Kum D. CRAFT: camera-radar 3D object detection with spatio-contextual fusion transformer. Proc AAAI Conf Artif Intell. 2023;37(1):1160–1168.
Kuhn HW. The Hungarian method for the assignment problem. Nav Res Logist Q. 1955;2(1–2):83–97.
Lin T-Y, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):318–327.