Abstract
As a typical application of edge intelligence, 3D object detection in autonomous driving often requires multimodal information fusion to accurately perceive the environment. With images and point clouds serving as critical sensory data sources, 3D object detection integrates multimodal fusion to enhance detection accuracy. Generally, fusion algorithms leveraging attention mechanism can intelligently extract and integrate multimodal sensing information to overcome limitations posed by sensor calibration. However, attention mechanism may cause challenges such as slow model convergence and high false positives. Therefore, in this paper, we propose the Deformable Denoising (DefDeN) model to effectively integrate modules including gated information fusion networks, multi-scale deformable attention mechanisms, noise addition and denoising method, and contrastive learning for multi-sensor feature fusion. Experimental results on the nuScenes dataset demonstrate the superiority of DefDeN in detection accuracy, and the effectiveness of precise and stable perception for complex scenarios in autonomous driving systems.