SiamCPN: Visual tracking with the Siamese center-prediction network

Dong Chen; Fan Tang; Weiming Dong; Hanxing Yao; Changsheng Xu

doi:10.1007/s41095-021-0212-1

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (9.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

SiamCPN: Visual tracking with the Siamese center-prediction network

Dong Chen^{¹^,²^,⁴}, Fan Tang^³, Weiming Dong^{¹^,²^,⁴}(

), Hanxing Yao^{⁴^,⁵}, Changsheng Xu^{¹^,²^,⁴}

1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100040, China

2NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

3School of Artificial Intelligence, Jilin University, Changchun 130012, China

4CASIA-LLVISION Joint Lab, Beijing 100190, China

5LLVISION Technology Co., LTD., Beijing 100190, China

Show Author Information

Abstract

Object detection is widely used in objecttracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In thisstudy, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for per-frame post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

Keywords

Siamese network single object tracking anchor-free center point detection

References

[1]

Danelljan,

; Häger,

; Shahbaz Khan,

; Felsberg,

Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, 2014.

Crossref

[2]

Henriques,

J. F.

; Caseiro,

; Martins,

; Batista,

High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.

Crossref Google Scholar

[3]

Kalal,

; Mikolajczyk,

; Matas,

Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1409-1422, 2012.

Crossref Google Scholar

[4]

Fan,

R. C.

; Zhang,

F. L.

; Zhang,

; Martin,

R. R.

Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285-294, 2017.

Crossref Google Scholar

[5]

Bertinetto,

; Valmadre,

; Henriques,

J. F.

; Vedaldi,

; Torr,

P. H. S.

Fully-convolutional Siamese networks for object tracking. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua,

; Jégou,

Eds. Springer Cham, 850-865, 2016.

Crossref

[6]

Li,

; Wu,

; Wang,

; Zhang,

F. Y.

; Xing,

J. L.

; Yan,

J. J.

SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4277-4286, 2019.

Crossref

[7]

Li,

; Yan,

J. J.

; Wu,

; Zhu,

; Hu,

X. L.

High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8971-8980, 2018.

Crossref

[8]

Tao,

; Gavves,

; Smeulders,

A. W. M.

Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420-1429, 2016.

Crossref

[9]

Ren,

; He,

; Girshick,

; Sun,

Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.

Crossref Google Scholar

[10]

Zhu,

; Wang,

; Li,

; Wu,

; Yan,

J. J.

; Hu,

W. M.

Distractor-aware Siamese networks for visual object tracking. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 103-119, 2018.

[11]

Zhang,

Z. P.

; Peng,

H. W.

Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586-4595, 2019.

Crossref

[12]

Guo,

D. Y.

; Wang,

; Cui,

; Wang,

Z. H.

; Chen,

S. Y.

SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268-6276, 2020.

Crossref

[13]

Zhang,

Z. P.

; Peng,

H. W.

; Fu,

J. L.

; Li,

; Hu,

W. M.

Ocean: Object-aware anchor-free tracking. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 771-787, 2020.

[14]

Han,

; Du,

; Liu,

J. X.

; Sun,

; Li,

X. F.

Fully conventional anchor-free Siamese networks for object tracking. IEEE Access Vol. 7, 123934-123943, 2019.

Crossref Google Scholar

[15]

Wang,

; Zhang,

; Bertinetto,

; Hu,

W. M.

; Torr,

P. H. S.

Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328-1338, 2019.

Crossref

[16]

Xu,

Y. D.

; Wang,

Z. Y.

; Li,

Z. X.

; Yuan,

; Yu,

SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 12549-12556, 2020.

Crossref

[17]

Peng,

S. Y.

; Yu,

Y. X.

; Wang,

; He,

Accurate anchor free tracking. arXiv preprint arXiv: 2006.07560, 2020.

Google Scholar

[18]

Krizhevsky,

; Sutskever,

; Hinton,

G. E.

ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84-90, 2017.

Crossref Google Scholar

[19]

He,

K. M.

; Zhang,

X. Y.

; Ren,

S. Q.

; Sun,

Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

[20]

Cao,

; Hidalgo,

; Simon,

; Wei,

; Sheikh,

OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172-186, 2018.

Crossref Google Scholar

[21]

Newell,

; Yang,

K. Y.

; Deng,

Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 483-499, 2016.

[22]

Papandreou,

; Zhu,

; Kanazawa,

; Toshev,

; Tompson,

; Bregler,

; Murphy,

Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711-3719, 2017.

Crossref

[23]

Zhou,

X. Y.

; Wang,

D. Q.

; Krähenbühl,

Objects as points. arXiv preprint arXiv: 1904.07850, 2019.

Google Scholar

[24]

Law,

; Deng,

CornerNet: Detecting objects as paired keypoints. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 765-781, 2018.

[25]

Lin,

T. Y.

; Goyal,

; Girshick,

; He,

K. M.

; Dollár,

Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999-3007, 2017.

Crossref

[26]

Russakovsky,

; Deng,

; Su,

; Krause,

; Satheesh,

; Ma,

; Huang,

; Karpathy,

; Khosla,

; Bernstein,

ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211-252, 2015.

Crossref Google Scholar

[27]

Huang,

L. H.

; Zhao,

; Huang,

K. Q.

GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, , 2019.

Crossref Google Scholar

[28]

Fan,

; Lin,

L. T.

; Yang,

; Chu,

; Deng,

; Yu,

S. J.

; Bai,

; Xu,

; Liao,

; Ling,

LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5369-5378, 2019.

Crossref

[29]

Lin,

T. Y.

; Maire,

; Belongie,

; Hays,

; Perona,

; Ramanan,

; Dollár,

; Zitnick,

C. L.

Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet,

; Pajdla,

; Schiele,

; Tuytelaars,

Eds. Springer Cham, 740-755, 2014.

Crossref

[30]

Real,

; Shlens,

; Mazzocchi,

; Pan,

; Vanhoucke,

YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7464-7473, 2017.

Crossref

[31]

Wu,

; Lim,

; Yang,

M. H.

Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411-2418, 2013.

Crossref

[32]

Wu,

; Lim,

; Yang,

M. H.

Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834-1848, 2015.

Crossref Google Scholar

[33]

Kristan,

; Leonardis,

; Matas,

; Felsberg,

; Pflugfelder,

; Čehovin,

; Vojír̃,

; Häger,

; Lukežič,

; Fernández,

et al. The visual object tracking VOT2016 challenge results. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua,

; Jégou,

Eds. Springer Cham, 777-823, 2016.

[34]

Mueller,

; Smith,

; Ghanem,

A benchmark and simulator for UAV tracking. In: Computer Vision -ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 445-461, 2016.

[35]

Henriques,

J. F.

; Caseiro,

; Martins,

; Batista,

High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.

Crossref Google Scholar

[36]

Danelljan,

; Hager,

; Khan,

F. S.

; Felsberg,

Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561-1575, 2017.

Crossref Google Scholar

[37]

Danelljan,

; Häger,

; Khan,

F. S.

; Felsberg,

Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310-4318, 2015.

Crossref

[38]

Bertinetto,

; Valmadre,

; Golodetz,

; Miksik,

; Torr,

P. H. S.

Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1401-1409, 2016.

Crossref

[39]

Valmadre,

; Bertinetto,

; Henriques,

; Vedaldi,

; Torr,

P. H. S.

End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5000-5008, 2017.

Crossref

[40]

Nam,

; Han,

Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293-4302, 2016.

Crossref

[41]

Danelljan,

; Bhat,

; Khan,

F. S.

; Felsberg,

ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931-6939, 2017.

Crossref

[42]

Danelljan,

; Robinson,

; Shahbaz Khan,

; Felsberg,

Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, Leibe,

; Matas,

;Sebe,

; Welling,

Eds. Springer Cham, 472-488, 2016.

Crossref

[43]

Wang,

G. T.

; Luo,

; Xiong,

Z. W.

; Zeng,

W. J.

SPM-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3638-3647, 2019.

Crossref

[44]

Danelljan,

; Bhat,

; Khan,

F. S.

; Felsberg,

ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4655-4664, 2019.

Crossref

[45]

Miller,

G. A.

WordNet. Communications of the ACM Vol. 38, No. 11, 39-41, 1995.

Crossref Google Scholar

[46]

Guo,

; Feng,

; Zhou,

; Huang,

; Wan,

; Wang,

Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1781-1789, 2017.

Crossref

[47]

Hong,

Z. B.

; Zhe,

; Wang,

C. H.

; Mei,

; Prokhorov,

; Tao,

D. C.

MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 749-758, 2015.

Crossref

[48]

Zhang,

J. M.

; Ma,

S. G.

; Sclaroff,

MEEM: Robust tracking via multiple experts using entropy minimization. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet,

; Pajdla,

; Schiele,

; Tuytelaars,

Eds. Springer Cham, 188-203, 2014.

[49]

Hare,

; Golodetz,

; Saffari,

; Vineet,

; Cheng,

M. M.

; Hicks,

S. L.

; Torr,

P. H. S.

Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2096-2109, 2016.

Crossref Google Scholar

Computational Visual Media

Volume 7 Issue 2,
June 2021

Pages 253-265

DOI: 10.1007/s41095-021-0212-1

Cite this article:

Chen D, Tang F, Dong W, et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Computational Visual Media, 2021, 7(2): 253-265. https://doi.org/10.1007/s41095-021-0212-1

896

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 10 January 2021

Accepted: 04 February 2021

Published: 05 April 2021

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.