AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (4.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

3D Single Object Tracking with Multi-View Unsupervised Center Uncertainty Learning

Chengpeng Zhong1Hui Shuai1Jiaqing Fan2Kaihua Zhang1Qingshan Liu1( )
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Show Author Information

Abstract

Center point localization is a major factor affecting the performance of 3D single object tracking. Point clouds themselves are a set of discrete points on the local surface of an object, and there is also a lot of noise in the labeling. Therefore, directly regressing the center coordinates is not very reasonable. Existing methods usually use volumetric-based, point-based, and view-based methods, with a relatively single modality. In addition, the sampling strategies commonly used usually result in the loss of object information, and holistic and detailed information is beneficial for object localization. To address these challenges, we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker (MUT). MUT models the potential uncertainty of center coordinates localization using an unsupervised manner, allowing the model to learn the true distribution. By projecting point clouds, MUT can obtain multi-view depth map features, realize efficient knowledge transfer from 2D to 3D, and provide another modality information for the tracker. We also propose a former attraction probability sampling strategy that preserves object information. By using both holistic and detailed descriptors of point clouds, the tracker can have a more comprehensive understanding of the tracking environment. Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8% and 0.6% in precision and success rate, respectively, and on the NuScenes dataset by 1.4%, and 6.1% in precision and success rate, respectively. The code is made available at https://github.com/abchears/MUT.git.

References

[1]
E. Machida, M. Cao, T. Murao, and H. Hashimoto, Human motion tracking of mobile robot with Kinect 3D sensor, in Proc. 2012 SICE Annu. Conf. (SICE), Akita, Japan, 2012, pp. 2207–2211.
[2]

X. Yan, J. Gao, J. Li, R. Zhang, Z. Li, R. Huang, and S. Cui, Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 4, pp. 3101–3109, 2021.

[3]
T. Yin, X. Zhou, and P. Krähenbühl, Center-based 3D object detection and tracking, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11779–11788.
[4]
A. I. Comport, E. Marchand, and F. Chaumette, Robust model-based tracking for robot vision, in Proc. 2004 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Sendai, Japan, 2005, pp. 692–697.
[5]
L. Landrieu and M. Simonovsky, Large-scale point cloud semantic segmentation with superpoint graphs, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 4558–4567.
[6]
C. Zheng, X. Yan, H. Zhang, B. Wang, S. Cheng, S. Cui, and Z. Li, Beyond 3D Siamese tracking: A motion-centric paradigm for 3D single object tracking in point clouds, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 8101–8110.
[7]
H. Qi, C. Feng, Z. Cao, F. Zhao, and Y. Xiao, P2B: Point-to-box network for 3D object tracking in point clouds, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 6328–6337.
[8]
C. R. Qi, O. Litany, K. He, and L. Guibas, Deep Hough voting for 3D object detection in point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2020, pp. 9276–9285.
[9]
X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, Delving into localization errors for monocular 3D object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 4719–4728.
[10]
Y. Zhang, Q. Zhang, Z. Zhu, J. Hou, and Y. Yuan, GLENet: boosting 3D object detectors with generative label uncertainty estimation, arXiv preprint arXiv: 2207.02466, 2022.
[11]
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, PointNet++: deep hierarchical feature learning on point sets in a metric space, arXiv preprint arXiv: 1706.02413, 2017.
[12]

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The KITTI dataset, Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237, 2013.

[13]
H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, NuScenes: A multimodal dataset for autonomous driving, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11618–11628.
[14]
S. Giancola, J. Zarzar, and B. Ghanem, Leveraging shape completion for 3D Siamese tracking, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020, pp. 1359–1368.
[15]

Z. Fang, S. Zhou, Y. Cui, and S. Scherer, 3D-SiamRPN: An end-to-end learning method for real-time 3D single object tracking using raw point cloud, IEEE Sens. J., vol. 21, no. 4, pp. 4995–5011, 2021.

[16]
B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, High performance visual tracking with Siamese region proposal network, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 8971–8980.
[17]
Z. Wang, Q. Xie, Y. K. Lai, J. Wu, K. Long, and J. Wang, MLVSNet: Multi-level voting Siamese network for 3D visual tracking, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 3081–3090.
[18]
A. Kendall and Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision? in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 5580–5590.
[19]
J. Choi, D. Chun, H. Kim, and H. J. Lee, Gaussian YOLOv3: An accurate and fast object detector using localization uncertainty for autonomous driving, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 502–511.
[20]
Y. Zhang, J. Lu, and J. Zhou, Objects are different: Flexible monocular 3D object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 3288–3297.
[21]
H. Chen, Y. Huang, W. Tian, Z. Gao, and L. Xiong, MonoRUn: monocular 3D object detection by reconstruction and uncertainty propagation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10374–10383.
[22]
J. Yang, J. Duan, S. Tran, Y. Xu, S. Chanda, L. Chen, B. Zeng, T. Chilimbi, and J. Huang, Vision-language pre-training with triple contrastive learning, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 15650–15659.
[23]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv: 2010.11929, 2020.
[24]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.
[25]
R. Q. Charles, S. Hao, K. Mo, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 77–85.
[26]
D. A. Nix and A. S. Weigend, Estimating the mean and variance of the target probability distribution, in Proc. 1994 IEEE Int. Conf. Neural Networks (ICNN'94), Orlando, FL, USA, 2002, pp. 55–60.
[27]
S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, arXiv preprint arXiv: 1506.01497, 2015.
[28]
C. Zheng, X. Yan, J. Gao, W. Zhao, W. Zhang, Z. Li, and S. Cui, Box-aware feature enhancement for single object tracking on point clouds, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 13179–13188.
[29]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[30]
Y. Wu, J. Lim, and M. H. Yang, Online object tracking: A benchmark, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 2013, pp. 2411–2418.
[31]
J. Zarzar, S. Giancola, and B. Ghanem, Efficient bird eye view proposals for 3D Siamese tracking, arXiv preprint arXiv: 1903.10168, 2019.
[32]

X. Zhou, L. Wang, Z. Yuan, K. Xu, and Y. Ma, Structure aware 3D single object tracking of point cloud, J. Electron. Imag., vol. 30, no. 4, p. 043010, 2021.

[33]
Y. Cui, Z. Fang, J. Shan, Z. Gu, and S. Zhou, 3D object tracking with transformer, arXiv preprint arXiv: 2110.14921, 2021.
[34]
C. Zhou, Z. Luo, Y. Luo, T. Liu, L. Pan, Z. Cai, H. Zhao, and S. Lu, PTTR: Relational 3D point cloud object tracking with transformer, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 8521–8530.
[35]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, arXiv preprint arXiv: 2103.00020, 2021.
CAAI Artificial Intelligence Research
Article number: 9150016
Cite this article:
Zhong C, Shuai H, Fan J, et al. 3D Single Object Tracking with Multi-View Unsupervised Center Uncertainty Learning. CAAI Artificial Intelligence Research, 2023, 2: 9150016. https://doi.org/10.26599/AIR.2023.9150016

677

Views

127

Downloads

0

Crossref

Altmetrics

Received: 13 March 2023
Revised: 26 April 2023
Accepted: 14 June 2023
Published: 08 October 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return