| Sign up

PDF (4.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Article | Open Access

3D Single Object Tracking with Multi-View Unsupervised Center Uncertainty Learning

Chengpeng Zhong^¹, Hui Shuai^¹, Jiaqing Fan^², Kaihua Zhang^¹, Qingshan Liu^¹()

1School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

2School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

Show Author Information

Abstract

Center point localization is a major factor affecting the performance of 3D single object tracking. Point clouds themselves are a set of discrete points on the local surface of an object, and there is also a lot of noise in the labeling. Therefore, directly regressing the center coordinates is not very reasonable. Existing methods usually use volumetric-based, point-based, and view-based methods, with a relatively single modality. In addition, the sampling strategies commonly used usually result in the loss of object information, and holistic and detailed information is beneficial for object localization. To address these challenges, we propose a novel Multi-view unsupervised center Uncertainty 3D single object Tracker (MUT). MUT models the potential uncertainty of center coordinates localization using an unsupervised manner, allowing the model to learn the true distribution. By projecting point clouds, MUT can obtain multi-view depth map features, realize efficient knowledge transfer from 2D to 3D, and provide another modality information for the tracker. We also propose a former attraction probability sampling strategy that preserves object information. By using both holistic and detailed descriptors of point clouds, the tracker can have a more comprehensive understanding of the tracking environment. Experimental results show that the proposed MUT network outperforms the baseline models on the KITTI dataset by 0.8% and 0.6% in precision and success rate, respectively, and on the NuScenes dataset by 1.4%, and 6.1% in precision and success rate, respectively. The code is made available at https://github.com/abchears/MUT.git.

Keywords

3D single object tracking uncertainty modeling multi-view feature holistic and detailed descriptor

References

[1]

E. Machida, M. Cao, T. Murao, and H. Hashimoto, Human motion tracking of mobile robot with Kinect 3D sensor, in Proc. 2012 SICE Annu. Conf. (SICE), Akita, Japan, 2012, pp. 2207–2211.

[2]

X. Yan, J. Gao, J. Li, R. Zhang, Z. Li, R. Huang, and S. Cui, Sparse single sweep LiDAR point cloud segmentation via learning contextual shape priors from scene completion, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 4, pp. 3101–3109, 2021.

Crossref Google Scholar

[3]

T. Yin, X. Zhou, and P. Krähenbühl, Center-based 3D object detection and tracking, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11779–11788.

[4]

A. I. Comport, E. Marchand, and F. Chaumette, Robust model-based tracking for robot vision, in Proc. 2004 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Sendai, Japan, 2005, pp. 692–697.

[5]

L. Landrieu and M. Simonovsky, Large-scale point cloud semantic segmentation with superpoint graphs, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 4558–4567.

[6]

C. Zheng, X. Yan, H. Zhang, B. Wang, S. Cheng, S. Cui, and Z. Li, Beyond 3D Siamese tracking: A motion-centric paradigm for 3D single object tracking in point clouds, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 8101–8110.

[7]

H. Qi, C. Feng, Z. Cao, F. Zhao, and Y. Xiao, P2B: Point-to-box network for 3D object tracking in point clouds, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 6328–6337.

[8]

C. R. Qi, O. Litany, K. He, and L. Guibas, Deep Hough voting for 3D object detection in point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2020, pp. 9276–9285.

[9]

X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, Delving into localization errors for monocular 3D object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 4719–4728.

[10]

Y. Zhang, Q. Zhang, Z. Zhu, J. Hou, and Y. Yuan, GLENet: boosting 3D object detectors with generative label uncertainty estimation, arXiv preprint arXiv: 2207.02466, 2022.

[11]

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, PointNet++: deep hierarchical feature learning on point sets in a metric space, arXiv preprint arXiv: 1706.02413, 2017.

[12]

A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The KITTI dataset, Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237, 2013.

Crossref Google Scholar

[13]

H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, NuScenes: A multimodal dataset for autonomous driving, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 11618–11628.

[14]

S. Giancola, J. Zarzar, and B. Ghanem, Leveraging shape completion for 3D Siamese tracking, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2020, pp. 1359–1368.

[15]

Z. Fang, S. Zhou, Y. Cui, and S. Scherer, 3D-SiamRPN: An end-to-end learning method for real-time 3D single object tracking using raw point cloud, IEEE Sens. J., vol. 21, no. 4, pp. 4995–5011, 2021.

Crossref Google Scholar

[16]

B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, High performance visual tracking with Siamese region proposal network, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 8971–8980.

[17]

Z. Wang, Q. Xie, Y. K. Lai, J. Wu, K. Long, and J. Wang, MLVSNet: Multi-level voting Siamese network for 3D visual tracking, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 3081–3090.

[18]

A. Kendall and Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision? in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, California, USA, 2017, pp. 5580–5590.

[19]

J. Choi, D. Chun, H. Kim, and H. J. Lee, Gaussian YOLOv3: An accurate and fast object detector using localization uncertainty for autonomous driving, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 502–511.

[20]

Y. Zhang, J. Lu, and J. Zhou, Objects are different: Flexible monocular 3D object detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 3288–3297.

[21]

H. Chen, Y. Huang, W. Tian, Z. Gao, and L. Xiong, MonoRUn: monocular 3D object detection by reconstruction and uncertainty propagation, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10374–10383.

[22]

J. Yang, J. Duan, S. Tran, Y. Xu, S. Chanda, L. Chen, B. Zeng, T. Chilimbi, and J. Huang, Vision-language pre-training with triple contrastive learning, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 15650–15659.

[23]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv: 2010.11929, 2020.

[24]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.

[25]

R. Q. Charles, S. Hao, K. Mo, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 77–85.

[26]

D. A. Nix and A. S. Weigend, Estimating the mean and variance of the target probability distribution, in Proc. 1994 IEEE Int. Conf. Neural Networks (ICNN'94), Orlando, FL, USA, 2002, pp. 55–60.

[27]

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, arXiv preprint arXiv: 1506.01497, 2015.

[28]

C. Zheng, X. Yan, J. Gao, W. Zhao, W. Zhang, Z. Li, and S. Cui, Box-aware feature enhancement for single object tracking on point clouds, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2022, pp. 13179–13188.

[29]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.

[30]

Y. Wu, J. Lim, and M. H. Yang, Online object tracking: A benchmark, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 2013, pp. 2411–2418.

[31]

J. Zarzar, S. Giancola, and B. Ghanem, Efficient bird eye view proposals for 3D Siamese tracking, arXiv preprint arXiv: 1903.10168, 2019.

[32]

X. Zhou, L. Wang, Z. Yuan, K. Xu, and Y. Ma, Structure aware 3D single object tracking of point cloud, J. Electron. Imag., vol. 30, no. 4, p. 043010, 2021.

Crossref Google Scholar

[33]

Y. Cui, Z. Fang, J. Shan, Z. Gu, and S. Zhou, 3D object tracking with transformer, arXiv preprint arXiv: 2110.14921, 2021.

[34]

C. Zhou, Z. Luo, Y. Luo, T. Liu, L. Pan, Z. Cai, H. Zhao, and S. Lu, PTTR: Relational 3D point cloud object tracking with transformer, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 8521–8530.

[35]

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, arXiv preprint arXiv: 2103.00020, 2021.

CAAI Artificial Intelligence Research

Article number: 9150016

DOI: 10.26599/AIR.2023.9150016

Cite this article:

Zhong C, Shuai H, Fan J, et al. 3D Single Object Tracking with Multi-View Unsupervised Center Uncertainty Learning. CAAI Artificial Intelligence Research, 2023, 2: 9150016. https://doi.org/10.26599/AIR.2023.9150016

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号