SiamCPN: Visual tracking with the Siamese center-prediction network

Dong Chen; Fan Tang; Weiming Dong; Hanxing Yao; Changsheng Xu

doi:10.1007/s41095-021-0212-1

| Sign up

PDF (9.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

SiamCPN: Visual tracking with the Siamese center-prediction network

Dong Chen^{¹^,²^,⁴}, Fan Tang^³, Weiming Dong^{¹^,²^,⁴}(), Hanxing Yao^{⁴^,⁵}, Changsheng Xu^{¹^,²^,⁴}

1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100040, China

2NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

3School of Artificial Intelligence, Jilin University, Changchun 130012, China

4CASIA-LLVISION Joint Lab, Beijing 100190, China

5LLVISION Technology Co., LTD., Beijing 100190, China

Show Author Information

Abstract

Object detection is widely used in objecttracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In thisstudy, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for per-frame post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

Keywords

Siamese network single object tracking anchor-free center point detection

References

[1]

Danelljan,

; Häger,

; Shahbaz Khan,

; Felsberg,

Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, 2014.

Crossref

[2]

Henriques,

J. F.

; Caseiro,

; Martins,

; Batista,

High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.

Crossref Google Scholar

[3]

Kalal,

; Mikolajczyk,

; Matas,

Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1409-1422, 2012.

Crossref Google Scholar

[4]

Fan,

R. C.

; Zhang,

F. L.

; Zhang,

; Martin,

R. R.

Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285-294, 2017.

Crossref Google Scholar

[5]

Bertinetto,

; Valmadre,

; Henriques,

J. F.

; Vedaldi,

; Torr,

P. H. S.

Fully-convolutional Siamese networks for object tracking. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua,

; Jégou,

Eds. Springer Cham, 850-865, 2016.

Crossref

[6]

Li,

; Wu,

; Wang,

; Zhang,

F. Y.

; Xing,

J. L.

; Yan,

J. J.

SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4277-4286, 2019.

Crossref

[7]

Li,

; Yan,

J. J.

; Wu,

; Zhu,

; Hu,

X. L.

High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8971-8980, 2018.

Crossref

[8]

Tao,

; Gavves,

; Smeulders,

A. W. M.

Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420-1429, 2016.

Crossref

[9]

Ren,

; He,

; Girshick,

; Sun,

Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.

Crossref Google Scholar

[10]

Zhu,

; Wang,

; Li,

; Wu,

; Yan,

J. J.

; Hu,

W. M.

Distractor-aware Siamese networks for visual object tracking. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 103-119, 2018.

[11]

Zhang,

Z. P.

; Peng,

H. W.

Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586-4595, 2019.

Crossref

[12]

Guo,

D. Y.

; Wang,

; Cui,

; Wang,

Z. H.

; Chen,

S. Y.

SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268-6276, 2020.

Crossref

[13]

Zhang,

Z. P.

; Peng,

H. W.

; Fu,

J. L.

; Li,

; Hu,

W. M.

Ocean: Object-aware anchor-free tracking. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 771-787, 2020.

[14]

Han,

; Du,

; Liu,

J. X.

; Sun,

; Li,

X. F.

Fully conventional anchor-free Siamese networks for object tracking. IEEE Access Vol. 7, 123934-123943, 2019.

Crossref Google Scholar

[15]

Wang,

; Zhang,

; Bertinetto,

; Hu,

W. M.

; Torr,

P. H. S.

Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328-1338, 2019.

Crossref

[16]

Xu,

Y. D.

; Wang,

Z. Y.

; Li,

Z. X.

; Yuan,

; Yu,

SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 12549-12556, 2020.

Crossref

[17]

Peng,

S. Y.

; Yu,

Y. X.

; Wang,

; He,

Accurate anchor free tracking. arXiv preprint arXiv: 2006.07560, 2020.

Google Scholar

[18]

Krizhevsky,

; Sutskever,

; Hinton,

G. E.

ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84-90, 2017.

Crossref Google Scholar

[19]

He,

K. M.

; Zhang,

X. Y.

; Ren,

S. Q.

; Sun,

Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

[20]

Cao,

; Hidalgo,

; Simon,

; Wei,

; Sheikh,

OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172-186, 2018.

Crossref Google Scholar

[21]

Newell,

; Yang,

K. Y.

; Deng,

Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 483-499, 2016.

[22]

Papandreou,

; Zhu,

; Kanazawa,

; Toshev,

; Tompson,

; Bregler,

; Murphy,

Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711-3719, 2017.

Crossref

[23]

Zhou,

X. Y.

; Wang,

D. Q.

; Krähenbühl,

Objects as points. arXiv preprint arXiv: 1904.07850, 2019.

Google Scholar

[24]

Law,

; Deng,

CornerNet: Detecting objects as paired keypoints. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 765-781, 2018.

[25]

Lin,

T. Y.

; Goyal,

; Girshick,

; He,

K. M.

; Dollár,

Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999-3007, 2017.

Crossref

[26]

Russakovsky,

; Deng,

; Su,

; Krause,

; Satheesh,

; Ma,

; Huang,

; Karpathy,

; Khosla,

; Bernstein,

ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211-252, 2015.

Crossref Google Scholar

[27]

Huang,

L. H.

; Zhao,

; Huang,

K. Q.

GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, , 2019.

Crossref Google Scholar

[28]

Fan,

; Lin,

L. T.

; Yang,

; Chu,

; Deng,

; Yu,

S. J.

; Bai,

; Xu,

; Liao,

; Ling,

LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5369-5378, 2019.

Crossref

[29]

Lin,

T. Y.

; Maire,

; Belongie,

; Hays,

; Perona,

; Ramanan,

; Dollár,

; Zitnick,

C. L.

Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet,

; Pajdla,

; Schiele,

; Tuytelaars,

Eds. Springer Cham, 740-755, 2014.

Crossref

[30]

Real,

; Shlens,

; Mazzocchi,

; Pan,

; Vanhoucke,

YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7464-7473, 2017.

Crossref

[31]

Wu,

; Lim,

; Yang,

M. H.

Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411-2418, 2013.

Crossref

[32]

Wu,

; Lim,

; Yang,

M. H.

Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834-1848, 2015.

Crossref Google Scholar

[33]

Kristan,

; Leonardis,

; Matas,

; Felsberg,

; Pflugfelder,

; Čehovin,

; Vojír̃,

; Häger,

; Lukežič,

; Fernández,

et al. The visual object tracking VOT2016 challenge results. In: Computer Vision - ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua,

; Jégou,

Eds. Springer Cham, 777-823, 2016.

[34]

Mueller,

; Smith,

; Ghanem,

A benchmark and simulator for UAV tracking. In: Computer Vision -ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe,

; Matas,

; Sebe,

; Welling,

Eds. Springer Cham, 445-461, 2016.

[35]

Henriques,

J. F.

; Caseiro,

; Martins,

; Batista,

High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583-596, 2015.

Crossref Google Scholar

[36]

Danelljan,

; Hager,

; Khan,

F. S.

; Felsberg,

Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561-1575, 2017.

Crossref Google Scholar

[37]

Danelljan,

; Häger,

; Khan,

F. S.

; Felsberg,

Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310-4318, 2015.

Crossref

[38]

Bertinetto,

; Valmadre,

; Golodetz,

; Miksik,

; Torr,

P. H. S.

Staple: Complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1401-1409, 2016.

Crossref

[39]

Valmadre,

; Bertinetto,

; Henriques,

; Vedaldi,

; Torr,

P. H. S.

End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5000-5008, 2017.

Crossref

[40]

Nam,

; Han,

Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293-4302, 2016.

Crossref

[41]

Danelljan,

; Bhat,

; Khan,

F. S.

; Felsberg,

ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931-6939, 2017.

Crossref

[42]

Danelljan,

; Robinson,

; Shahbaz Khan,

; Felsberg,

Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, Leibe,

; Matas,

;Sebe,

; Welling,

Eds. Springer Cham, 472-488, 2016.

Crossref

[43]

Wang,

G. T.

; Luo,

; Xiong,

Z. W.

; Zeng,

W. J.

SPM-tracker: Series-parallel matching for real-time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3638-3647, 2019.

Crossref

[44]

Danelljan,

; Bhat,

; Khan,

F. S.

; Felsberg,

ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4655-4664, 2019.

Crossref

[45]

Miller,

G. A.

WordNet. Communications of the ACM Vol. 38, No. 11, 39-41, 1995.

Crossref Google Scholar

[46]

Guo,

; Feng,

; Zhou,

; Huang,

; Wan,

; Wang,

Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1781-1789, 2017.

Crossref

[47]

Hong,

Z. B.

; Zhe,

; Wang,

C. H.

; Mei,

; Prokhorov,

; Tao,

D. C.

MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 749-758, 2015.

Crossref

[48]

Zhang,

J. M.

; Ma,

S. G.

; Sclaroff,

MEEM: Robust tracking via multiple experts using entropy minimization. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet,

; Pajdla,

; Schiele,

; Tuytelaars,

Eds. Springer Cham, 188-203, 2014.

[49]

Hare,

; Golodetz,

; Saffari,

; Vineet,

; Cheng,

M. M.

; Hicks,

S. L.

; Torr,

P. H. S.

Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2096-2109, 2016.

Crossref Google Scholar

Computational Visual Media

Volume 7 Issue 2,
June 2021

Pages 253-265

DOI: 10.1007/s41095-021-0212-1

Cite this article:

Chen D, Tang F, Dong W, et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Computational Visual Media, 2021, 7(2): 253-265. https://doi.org/10.1007/s41095-021-0212-1