AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (9.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

SiamCPN: Visual tracking with the Siamese center-prediction network

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100040, China
NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
School of Artificial Intelligence, Jilin University, Changchun 130012, China
CASIA-LLVISION Joint Lab, Beijing 100190, China
LLVISION Technology Co., LTD., Beijing 100190, China
Show Author Information

Abstract

Object detection is widely used in objecttracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In thisstudy, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for per-frame post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

References

[1]
Danelljan, M.; Häger, G.; Shahbaz Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, 2014.
[2]
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.
[3]
Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1409-1422, 2012.
[4]
Fan, R. C.; Zhang, F. L.; Zhang, M.; Martin, R. R. Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285-294, 2017.
[5]
Bertinetto, L.; Valmadre, J.; Henriques, J. F.; Vedaldi, A.; Torr, P. H. S. Fully-convolutional Siamese networks for object tracking. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 850-865, 2016.
[6]
Li, B.; Wu, W.; Wang, Q.; Zhang, F. Y.; Xing, J. L.; Yan, J. J. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4277-4286, 2019.
[7]
Li, B.; Yan, J. J.; Wu, W.; Zhu, Z.; Hu, X. L.High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8971-8980, 2018.
[8]
Tao, R.; Gavves, E.; Smeulders, A. W. M. Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420-1429, 2016.
[9]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.
[10]
Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J. J.; Hu, W. M. Distractor-aware Siamese networks for visual object tracking. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 103-119, 2018.
[11]
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586-4595, 2019.
[12]
Guo, D. Y.; Wang, J.; Cui, Y.; Wang, Z. H.; Chen, S. Y. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268-6276, 2020.
[13]
Zhang, Z. P.; Peng, H. W.; Fu, J. L.; Li, B.; Hu, W. M. Ocean: Object-aware anchor-free tracking. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 771-787, 2020.
[14]
Han, G.; Du, H.; Liu, J. X.; Sun, N.; Li, X. F. Fully conventional anchor-free Siamese networks for object tracking. IEEE Access Vol. 7, 123934-123943, 2019.
[15]
Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W. M.; Torr, P. H. S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328-1338, 2019.
[16]
Xu, Y. D.; Wang, Z. Y.; Li, Z. X.; Yuan, Y.; Yu, G. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 12549-12556, 2020.
[17]
Peng, S. Y.; Yu, Y. X.; Wang, K.; He, L. Accurate anchor free tracking. arXiv preprint arXiv: 2006.07560, 2020.
[18]
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84-90, 2017.
[19]
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[20]
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172-186, 2018.
[21]
Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483-499, 2016.
[22]
Papandreou, G.; Zhu, T.; Kanazawa, N.; Toshev, A.; Tompson, J.; Bregler, C.; Murphy, K. Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711-3719, 2017.
[23]
Zhou, X. Y.; Wang, D. Q.; Krähenbühl, P. Objects as points. arXiv preprint arXiv: 1904.07850, 2019.
[24]
Law, H.; Deng, J. CornerNet: Detecting objects as paired keypoints. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 765-781, 2018.
[25]
Lin, T. Y.; Goyal, P.; Girshick, R.; He, K. M.; Dollár, P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999-3007, 2017.
[26]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211-252, 2015.
[27]
Huang, L. H.; Zhao, X.; Huang, K. Q. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, , 2019.
[28]
Fan, H.; Lin, L. T.; Yang, F.; Chu, P.; Deng, G.; Yu, S. J.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5369-5378, 2019.
[29]
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740-755, 2014.
[30]
Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7464-7473, 2017.
[31]
Wu, Y.; Lim, J.; Yang, M. H. Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411-2418, 2013.
[32]
Wu, Y.; Lim, J.; Yang, M. H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834-1848, 2015.
[33]
Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin, L.; Vojír̃, T.; Häger, G.; Lukežič, A.; Fernández, G. et al. The visual object tracking VOT2016 challenge results. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 777-823, 2016.
[34]
Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In: Computer Vision -ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 445-461, 2016.
[35]
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.
[36]
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561-1575, 2017.
[37]
Danelljan, M.; Häger, G.; Khan, F. S.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310-4318, 2015.
[38]
Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P. H. S. Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1401-1409, 2016.
[39]
Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P. H. S. End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5000-5008, 2017.
[40]
Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293-4302, 2016.
[41]
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931-6939, 2017.
[42]
Danelljan, M.; Robinson, A.; Shahbaz Khan, F.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, Leibe, B.; Matas, J.;Sebe, N.; Welling, M. Eds. Springer Cham, 472-488, 2016.
[43]
Wang, G. T.; Luo, C.; Xiong, Z. W.; Zeng, W. J. SPM-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3638-3647, 2019.
[44]
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4655-4664, 2019.
[45]
Miller, G. A. WordNet. Communications of the ACM Vol. 38, No. 11, 39-41, 1995.
[46]
Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1781-1789, 2017.
[47]
Hong, Z. B.; Zhe, C.; Wang, C. H.; Mei, X.; Prokhorov, D.; Tao, D. C. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 749-758, 2015.
[48]
Zhang, J. M.; Ma, S. G.; Sclaroff, S. MEEM: Robust tracking via multiple experts using entropy minimization. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 188-203, 2014.
[49]
Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M. M.; Hicks, S. L.; Torr, P. H. S. Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2096-2109, 2016.
Computational Visual Media
Pages 253-265
Cite this article:
Chen D, Tang F, Dong W, et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Computational Visual Media, 2021, 7(2): 253-265. https://doi.org/10.1007/s41095-021-0212-1

835

Views

95

Downloads

5

Crossref

5

Web of Science

6

Scopus

3

CSCD

Altmetrics

Received: 10 January 2021
Accepted: 04 February 2021
Published: 05 April 2021
© The Author(s) 2021

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return