Journal Home > Volume 1 , Issue 2

Lightweight modules play a key role in 3D object detection tasks for autonomous driving, which are necessary for the application of 3D object detectors. At present, research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate. However, building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem. In this paper, we focus on combining convolutional neural networks with self-attention-based vision transformers to realize lightweight and high-speed computing for 3D object detection. We propose light-weight detection 3D (LWD-3D), which is a point cloud conversion and lightweight vision transformer for autonomous driving. LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data, which provides a new feature representation method based on a vision transformer for 3D detection applications. The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection (time per image < 20 ms). LWD-3D obtains a mean average precision (mAP) 75% higher than that of another 3D real-time detector with half the number of parameters. Our research extends the application of visual transformers to 3D object detection tasks.


menu
Abstract
Full text
Outline
About this article

LWD-3D: Lightweight Detector Based on Self-Attention for 3D Object Detection

Show Author's information Shuo Yang1Huimin Lu1,2( )Tohru Kamiya1Yoshihisa Nakatoh1Seiichi Serikawa1
School of Engineering, Kyushu Institute of Technology, Fukuoka 804-8550, Japan
School of Information Engineering, Yangzhou University, Yangzhou 225127, China.

Abstract

Lightweight modules play a key role in 3D object detection tasks for autonomous driving, which are necessary for the application of 3D object detectors. At present, research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate. However, building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem. In this paper, we focus on combining convolutional neural networks with self-attention-based vision transformers to realize lightweight and high-speed computing for 3D object detection. We propose light-weight detection 3D (LWD-3D), which is a point cloud conversion and lightweight vision transformer for autonomous driving. LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data, which provides a new feature representation method based on a vision transformer for 3D detection applications. The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection (time per image < 20 ms). LWD-3D obtains a mean average precision (mAP) 75% higher than that of another 3D real-time detector with half the number of parameters. Our research extends the application of visual transformers to 3D object detection tasks.

Keywords: point clouds, real-time, 3D object detection, vision transformer, one-shot regression

References(22)

[1]
S. Shi, L. Jiang, J. Deng, Z. Wang, C. Guo, J. Shi, X. Wang, and H. Li, PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection, arXiv preprint arXiv: 2102.00463, 2021.
DOI
[2]
R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 77–85.
DOI
[3]
A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? The KITTI vision benchmark suite, in Proc. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 335–336.
DOI
[4]
M. Liang, B. Yang, Y. Chen, R. Hu, and R. Urtasun, Multi-task multi-sensor fusion for 3D object detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7337–7345.
DOI
[5]
J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, Joint 3D proposal generation and object detection from view aggregation, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 1–8.
DOI
[6]
X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-view 3D object detection network for autonomous driving, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6526–6534.
DOI
[7]
P. Li, X. Chen, and S. Shen, Stereo R-CNN based 3D object detection for autonomous driving, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7636–7644.
DOI
[8]
Y. Zhou and O. Tuzel, VoxelNet: End-to-end learning for point cloud based 3D object detection, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4490–4499.
DOI
[9]
D. Maturana and S. Scherer, VoxNet: A 3D convolutional neural network for real-time object recognition, in Proc. 2015 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hamburg, Germany, 2015, pp. 922–928.
DOI
[10]

X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, 3D object proposals using stereo imagery for accurate object class detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp. 1259–1272, 2018.

[11]
Y. Chen, S. Liu, X. Shen, and J. Jia, Fast point R-CNN, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 9774–9783.
DOI
[12]
S. Shi, X. Wang, and H. Li, PointRCNN: 3D object proposal generation and detection from point cloud, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 770–779.
DOI
[13]
W. Zheng, W. Tang, L. Jiang, and C. W. Fu, SE-SSD: Self-ensembling single-stage object detector from point cloud, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 14489–14498.
DOI
[14]
C. R. Qi, Y. Li, H. Su, and L. J. Guibas, PointNet++: Deep hierarchical feature learning on point sets in a metric space, in Proc. 31st Int. Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5105–5114.
[15]

Y. Yan, Y. Mao, and B. Li, SECOND: Sparsely embedded convolutional detection, Sensors, vol. 18, no. 10, p. 3337, 2018.

[16]
C. R. Qi, O. Litany, K. He, and L. Guibas, Deep Hough voting for 3D object detection in point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 9276–9285.
[17]
A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, PointPillars: Fast encoders for object detection from point clouds, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 12689–12697.
DOI
[18]
M. Simon, S. Milz, K. Amende, and H. M. Gross, Complex-YOLO: Real-time 3D object detection on point clouds, arXiv preprint arXiv: 1803.06199v2, 2018.
DOI
[19]
Z. Yang, Y. Sun, S. Liu, and J. Jia, 3DSSD: Point-based 3d single stage object detector, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11037–11045.
DOI
[20]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.
[21]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788.
DOI
[22]
J. Redmon and A. Farhadi. YOLO9000: Better, faster, stronger, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517–6525.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 05 December 2022
Revised: 01 January 2023
Accepted: 08 January 2023
Published: 10 March 2023
Issue date: December 2022

Copyright

© The author(s) 2022

Acknowledgements

Acknowledgment

This work was partially supported by the National Natural Science Foundation of China (No. 62206237), Japan Science Promotion Society (Nos. 22K12093 and 22K12094), and Japan Science and Technology Agency (No. JPMJST2281).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return