| Sign up

PDF (1.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Original Research | Open Access

LWD-3D: Lightweight Detector Based on Self-Attention for 3D Object Detection

Shuo Yang^¹, Huimin Lu^{¹^,²}(), Tohru Kamiya^¹, Yoshihisa Nakatoh^¹, Seiichi Serikawa^¹

1School of Engineering, Kyushu Institute of Technology, Fukuoka 804-8550, Japan

2School of Information Engineering, Yangzhou University, Yangzhou 225127, China.

Show Author Information

Abstract

Lightweight modules play a key role in 3D object detection tasks for autonomous driving, which are necessary for the application of 3D object detectors. At present, research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate. However, building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem. In this paper, we focus on combining convolutional neural networks with self-attention-based vision transformers to realize lightweight and high-speed computing for 3D object detection. We propose light-weight detection 3D (LWD-3D), which is a point cloud conversion and lightweight vision transformer for autonomous driving. LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data, which provides a new feature representation method based on a vision transformer for 3D detection applications. The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection (time per image < 20 ms). LWD-3D obtains a mean average precision (mAP) 75% higher than that of another 3D real-time detector with half the number of parameters. Our research extends the application of visual transformers to 3D object detection tasks.

Keywords

3D object detection point clouds vision transformer one-shot regression real-time

References

[1]

S. Shi, L. Jiang, J. Deng, Z. Wang, C. Guo, J. Shi, X. Wang, and H. Li, PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection, arXiv preprint arXiv: 2102.00463, 2021.

[2]

R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 77–85.

[3]

A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? The KITTI vision benchmark suite, in Proc. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 335–336.

[4]

M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, Multi-task multi-sensor fusion for 3D object detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7337–7345.

[5]

J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, Joint 3D proposal generation and object detection from view aggregation, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 1–8.

[6]

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-view 3D object detection network for autonomous driving, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6526–6534.

[7]

P. Li, X. Chen, and S. Shen, Stereo R-CNN based 3D object detection for autonomous driving, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7636–7644.

[8]

Y. Zhou and O. Tuzel, VoxelNet: End-to-end learning for point cloud based 3D object detection, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4490–4499.

[9]

D. Maturana and S. Scherer, VoxNet: A 3D convolutional neural network for real-time object recognition, in Proc. 2015 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hamburg, Germany, 2015, pp. 922–928.

[10]

X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, 3D object proposals using stereo imagery for accurate object class detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp. 1259–1272, 2018.

Crossref Google Scholar

[11]

Y. Chen, S. Liu, X. Shen, and J. Jia, Fast point R-CNN, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 9774–9783.

[12]

S. Shi, X. Wang, and H. Li, PointRCNN: 3D object proposal generation and detection from point cloud, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 770–779.

[13]

W. Zheng, W. Tang, L. Jiang, and C. W. Fu, SE-SSD: Self-ensembling single-stage object detector from point cloud, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 14489–14498.

[14]

C. R. Qi, Y. Li, H. Su, and L. J. Guibas, PointNet++: Deep hierarchical feature learning on point sets in a metric space, in Proc. 31^st Int. Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5105–5114.

[15]

Y. Yan, Y. Mao, and B. Li, SECOND: Sparsely embedded convolutional detection, Sensors, vol. 18, no. 10, p. 3337, 2018.

Crossref Google Scholar

[16]

C. R. Qi, O. Litany, K. He, and L. Guibas, Deep Hough voting for 3D object detection in point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 9276–9285.

[17]

A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, PointPillars: Fast encoders for object detection from point clouds, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 12689–12697.

[18]

M. Simon, S. Milz, K. Amende, and H. M. Gross, Complex-YOLO: Real-time 3D object detection on point clouds, arXiv preprint arXiv: 1803.06199v2, 2018.

[19]

Z. Yang, Y. Sun, S. Liu, and J. Jia, 3DSSD: Point-based 3d single stage object detector, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11037–11045.

[20]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.

[21]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788.

[22]

J. Redmon and A. Farhadi. YOLO9000: Better, faster, stronger, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517–6525.

CAAI Artificial Intelligence Research

Volume 1 Issue 2,
December 2022

Pages 137-143

DOI: 10.26599/AIR.2022.9150009

Cite this article:

Yang S, Lu H, Kamiya T, et al. LWD-3D: Lightweight Detector Based on Self-Attention for 3D Object Detection. CAAI Artificial Intelligence Research, 2022, 1(2): 137-143. https://doi.org/10.26599/AIR.2022.9150009

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号