Multi-modal visual tracking: Review and experimental comparison

Pengyu Zhang; Dong Wang; Huchuan Lu

doi:10.1007/s41095-023-0345-5

| Sign up

PDF (5.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Review Article | Open Access

Multi-modal visual tracking: Review and experimental comparison

Pengyu Zhang^¹, Dong Wang^¹

(), Huchuan Lu^¹

1Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Visual object tracking has been drawing increasing attention in recent years, as a fundamental task in computer vision. To extend the range of tracking applications, researchers have been introducing information from multiple modalities to handle specific scenes, with promising research prospects for emerging methods and benchmarks. To provide a thorough review of multi-modal tracking, different aspects of multi-modal tracking algorithms are summarized under a unified taxonomy, with specific focus on visible-depth (RGB-D) and visible-thermal (RGB-T) tracking. Subsequently, a detailed description of the related benchmarks and challenges is provided. Extensive experiments were conducted to analyze the effectiveness of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT. Finally, various future directions, including model design and dataset construction, are discussed from different perspectives for further research.

Keywords

visual tracking object tracking multi-modal fusion RGB-T tracking RGB-D tracking

Electronic Supplementary Material

Download File(s)

41095_0345_ESM.pdf (291.9 KB)

References

[1]

Li,

C. L.

; Cheng,

; Hu,

S. Y.

; Liu,

X. B.

; Tang,

; Lin,

Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing Vol. 25, No. 12, 5743–5756, 2016.

Crossref Google Scholar

[2]

Li,

C. L.

; Zhao,

; Lu,

Y. J.

; Zhu,

C. L.

; Tang,

Weighted sparse representation regularized graph learning for RGB-T object tracking. In: Proceedings of the 25th ACM International Conference on Multimedia, 1856–1864, 2017.

Crossref

[3]

Li,

C. L.

; Liang,

X. Y.

; Lu,

Y. J.

; Zhao,

; Tang,

RGB-T object tracking: Benchmark and baseline. Pattern Recognition Vol. 96, 106977, 2019.

Crossref Google Scholar

[4]

Xiao,

J. J.

; Stolkin,

; Gao,

Y. Q.

; Leonardis,

Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Transactions on Cybernetics Vol. 48, No. 8, 2485–2499, 2018.

Crossref Google Scholar

[5]

Kristan,

; Leonardis,

; Matas,

; Felsberg,

; Pflugfelder,

; Kämäräinen,

J. K.

; Danelljan,

;, Lukežič,

; Drbohlav,

; He,

; et al. The eighth visual object tracking VOT2020 challenge results. In: Computer Vision – ECCV 2020 Workshops Lecture Notes in Computer Science, Vol. 12539. Bartoli,

; Fusiello,

Eds. Springer Cham, 547–601, 2020.

[6]

Kristan,

; Matas,

; Leonardis,

; Felsberg,

; Cehovin,

; Fernandez,

; Vojir,

; Hager,

; Nebehay,

; Pflugfelder,

The visual object tracking VOT2015 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, 564–586, 2015.

[7]

Kart,

; Lukežič,

; Kristan,

; Kämäräinen,

J.-K.

; Matas,

Object tracking by reconstruction with view-specific discriminative correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1339–1348, 2019.

Crossref

[8]

Cvejic,

; Nikolov,

S. G.

; Knowles,

H. D.

; Loza,

; Achim,

; Bull,

D. R.

; Canagarajah,

C. N.

The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2007.

Crossref

[9]

Kart,

; Kämäräinen,

J. K.

; Matas,

How to make an RGBD tracker? In: Computer Vision – ECCV 2018 Workshops. Lecture Notes in Computer Science, Vol. 11129. Leal-Taixé,

; Roth,

Eds. Springer Cham, 148–161, 2019.

Crossref

[10]

Zhang,

P. Y.

; Zhao,

; Bo,

C. J.

; Wang,

; Lu,

H. C.

; Yang,

X. Y.

Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing Vol. 30, 3335–3347, 2021.

Crossref Google Scholar

[11]

Li,

C. L.

; Lu,

A. D.

; Zheng,

A. H.

; Tu,

Z. Z.

; Tang,

Multi-adapter RGBT tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2262–2270, 2019.

Crossref

[12]

Weng,

S. K.

; Kuo,

C. M.

; Tu,

S. K.

Video object tracking using adaptive Kalman filter. Journal of Visual Communication and Image Representation Vol. 17, No. 6, 1190–1208, 2006.

Crossref Google Scholar

[13]

Kulikov,

G. Y.

; Kulikova,

M. V.

The accurate continuous-discrete extended Kalman filter for radar tracking. IEEE Transactions on Signal Processing Vol. 64, No. 4, 948–958, 2016.

Crossref Google Scholar

[14]

Yang,

C. J.

; Duraiswami,

; Davis,

Fast multiple object tracking via a hierarchical particle filter. In: Proceedings of the 10th IEEE International Conference on Computer Vision, Vol. 1, 212–219, 2005.

[15]

Okuma,

; Taleghani,

; de Freitas,

; Little,

J. J.

; Lowe,

D. G.

A boosted particle filter: Multitarget detection and tracking. In: Computer Vision - ECCV 2004. Lecture Notes in Computer Science, Vol. 3021. Pajdla,

; Matas,

Eds. Springer Berlin Heidelberg, 28–39, 2004.

Crossref

[16]

Zhang,

T. Z.

; Ghanem,

; Liu,

; Ahuja,

Low-rank sparse learning for robust visual tracking. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon,

; Lazebnik,

; Perona,

; Sato,

; Schmid,

Eds. Springer Berlin Heidelberg, 470–484, 2012.

Crossref

[17]

Zhang,

T. Z.

; Ghanem,

; Liu,

; Ahuja,

Robust visual tracking via multi-task sparse learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2042–2049, 2012.

[18]

Bolme,

; Beveridge,

J. R.

; Draper,

B. A.

; Lui,

Y. M.

Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2544–2550, 2010.

Crossref

[19]

Li,

; Zhu,

J. K.

A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision - ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8926. Agapito,

; Bronstein,

; Rother,

Eds. Springer Cham, 254–265, 2015.

Crossref

[20]

Bertinetto,

; Valmadre,

; Henriques,

J. F.

; Vedaldi,

; Torr,

P. H. S.

Fully-convolutional Siamese networks for object tracking. In: Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua,

; Jégou,

Eds. Springer Cham, 850–865, 2016.

Crossref

[21]

Zhang,

Z. P.

; Peng,

H. W.

Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.

Crossref

[22]

Danelljan,

; Hager,

; Khan,

F. S.

; Felsberg,

Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.

Crossref

[23]

Galoogahi,

H. K.

; Fagg,

; Lucey,

Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1144–1152, 2017.

Crossref

[24]

Wang,

; Zhang,

; Bertinetto,

; Hu,

W. M.

; Torr,

P. H. S.

Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328–1338, 2019.

Crossref

[25]

Lukezic,

; Matas,

; Kristan,

D3S—A discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7131–7140, 2020.

Crossref

[26]

Song,

; Zhao,

H. J.

; Cui,

J. S.

; Shao,

X. W.

; Shibasaki,

; Zha,

H. B.

An online system for multiple interacting targets tracking. ACM Transactions on Intelligent Systems and Technology Vol. 4, No. 1, Article No. 18, 2013.

Crossref Google Scholar

[27]

Kim,

D. Y.

; Jeon,

Data fusion of radar and image measurements for multi-object tracking via Kalman filtering. Information Sciences Vol. 278, 641–652, 2014.

Crossref Google Scholar

[28]

Megherbi,

; Ambellouis,

; Colot,

; Cabestaing,

Joint audio-video people tracking using belief theory. In: Proceedings of the IEEE Conference on Advanced Video and Signal based Surveillance, 135–140, 2005.

[29]

Zhang,

P. Y.

; Wang,

; Lu,

H. C.

; Yang,

X. Y.

Learning adaptive attribute-driven representation for real-time RGB-T tracking. International Journal of Computer Vision Vol. 129, No. 9, 2714–2729, 2021.

Crossref Google Scholar

[30]

Yan,

; Yang,

J. Y.

; Kapyla,

; Zheng,

; Leonardis,

; Kamarainen,

J. K.

DepthTrack: Unveiling the power of RGBD tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10705–10713, 2021.

Crossref

[31]

Gao,

; Yang,

J. Y.

; Li,

; Zheng,

; Leonardis,

; Song,

J. K.

Learning dual-fused modality-awarerepresentations for RGBD tracking. In: Computer Vision – ECCV 2022 Workshops. Lecture Notes in Computer Science, Vol. 13808. Karlinsky,

; Michaeli,

; Nishino,

Eds. Springer Cham, 478–494, 2023.

Crossref

[32]

Lan,

X. Y.

; Ye,

; Zhang,

S. P.

; Yuen,

Robust collaborative discriminative learning for RGB-infrared tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, No. 1, 7008–7015, 2018.

Crossref Google Scholar

[33]

Liu,

; Jing,

X. Y.

; Nie,

J. H.

; Gao,

; Liu,

; Jiang,

G. P.

Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in RGB-D videos. IEEE Transactions on Multimedia Vol. 21, No. 3, 664–677, 2019.

Crossref Google Scholar

[34]

Atrey,

P. K.

; Hossain,

M. A.

; El Saddik,

; Kankanhalli,

M. S.

Multimodal fusion for multimedia analysis: A survey. Multimedia Systems Vol. 16, No. 6, 345–379, 2010.

Crossref Google Scholar

[35]

Walia,

G. S.

; Kapoor,

Recent advances on multicue object tracking: A survey. Artificial Intelligence Review Vol. 46, No. 1, 1–39, 2016.

Crossref Google Scholar

[36]

Cai,

Z. Y.

; Han,

J. G.

; Liu,

; Shao,

RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools and Applications Vol. 76, No. 3, 4313–4355, 2017.

Crossref Google Scholar

[37]

Camplani,

; Paiement,

; Mirmehdi,

; Damen,

D. M.

; Hannuna,

; Burghardt,

; Tao,

L. L.

Multiple human tracking in RGB-depth data: A survey. IET Computer Vision Vol. 11, No. 4, 265–285, 2017.

Crossref Google Scholar

[38]

Baltrusaitis,

; Ahuja,

; Morency,

L. P.

Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 2, 423–443, 2019.

Crossref Google Scholar

[39]

Ma,

J. Y.

; Ma,

; Li,

Infrared and visible image fusion methods and applications: A survey. Information Fusion Vol. 45, 153–178, 2019.

Crossref Google Scholar

[40]

Zhang,

X. C.

; Ye,

; Leung,

; Gong,

; Xiao,

Object fusion tracking based on visible and infrared images: A comprehensive review. Information Fusion Vol. 63, 166–187, 2020.

Crossref Google Scholar

[41]

Bibi,

; Zhang,

T. Z.

; Ghanem,

3D part-based sparse tracker with automatic synchronization and registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1439–1448, 2016.

Crossref

[42]

Gutev,

; Debono,

C. J.

Exploiting depth information to increase object tracking robustness. In: Proceedings of the IEEE EUROCON 18th International Conference on Smart Technologies, 1–5, 2019.

Crossref

[43]

Xie,

Y. J.

; Lu,

; Gu,

RGB-D object tracking with occlusion detection. In: Proceedings of the15th International Conference on Computational Intelligence and Security, 11–15, 2019.

Crossref

[44]

Zhong,

B. N.

; Shen,

Y. J.

; Chen,

; Xie,

W. B.

; Cui,

; Zhang,

H. B.

; Chen,

D. S.

; Wang,

; Liu,

; Peng,

S. J.

; et al. Online learning 3D context for robust visual tracking. Neurocomputing Vol. 151, 710–718, 2015.

Crossref Google Scholar

[45]

An,

; Zhao,

X. G.

; Hou,

Z. G.

Online RGB-D tracking via detection-learning-segmentation. In: Proceedings of the 23rd International Conference on Pattern Recognition, 1231–1236, 2016.

[46]

Camplani,

; Hannuna,

; Mirmehdi,

; Damen,

D. M.

; Paiement,

; Tao,

L. L.

; Burghardt,

Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In: Proceedings of the British Machine Vision Conference, 145.1–145.11, 2015.

Crossref

[47]

Ding,

; Song,

Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery, 930–935, 2015.

[48]

García,

G. M.

; Klein,

D. A.

; Stückler,

; Frintrop,

; Cremers,

A. B.

Adaptive multi-cue 3D tracking of arbitrary objects. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 7476. Pinz,

; Pock,

; Bischof,

; Leberl,

Eds. Springer Berlin Heidelberg, 357–366, 2012.

Crossref

[49]

Hannuna,

; Camplani,

; Hall,

; Mirmehdi,

; Damen,

D. M.

; Burghardt,

; Paiement,

; Tao,

L. L.

DS-KCF: A real-time tracker for RGB-D data. Journal of Real-Time Image Processing Vol. 16, No. 5, 1439–1458, 2019.

Crossref Google Scholar

[50]

Kart,

; Kamarainen,

J. K.

; Matas,

; Fan,

L. X.

; Cricri,

Depth masked discriminative correlation filter. In: Proceedings of the 24th International Conference on Pattern Recognition, 2112–2117, 2018.

Crossref

[51]

Kuai,

Y. L.

; Wen,

G. J.

; Li,

D. D.

; Xiao,

J. J.

Target-aware correlation filter tracking in RGBD videos. IEEE Sensors Journal Vol. 19, No. 20, 9522–9531, 2019.

Crossref Google Scholar

[52]

Leng,

J. X.

; Liu,

Real-time RGB-D visual tracking with scale estimation and occlusion handling. IEEE Access Vol. 6, 24256–24263, 2018.

Crossref Google Scholar

[53]

Liu,

W. C.

; Tang,

X. A.

; Zhao,

C. L.

Robust RGBD tracking via weighted convolution operators. IEEE Sensors Journal Vol. 20, No. 8, 4496–4503, 2020.

Crossref Google Scholar

[54]

Ma,

Z. A.

; Xiang,

Z. Y.

Robust object tracking with RGBD-based sparse learning. Frontiers of Information Technology & Electronic Engineering Vol. 18, No. 7, 989–1001, 2017.

Crossref Google Scholar

[55]

Meshgi,

; Maeda,

S. I.

; Oba,

; Skibbe,

; Li,

Y. Z.

; Ishii,

An occlusion-aware particle filter tracker tohandle complex and persistent occlusions. Computer Vision and Image Understanding Vol. 150, 81–94, 2016.

Crossref Google Scholar

[56]

Chen,

S. Y.

; Zhu,

W. J.

; Leung,

Thermo-visual video fusion using probabilistic graphical model for human tracking. In: Proceedings of the IEEE International Symposium on Circuits and Systems, 1926–1929, 2008.

[57]

Wu,

; Blasch,

; Chen,

G. S.

; Bai,

; Ling,

H. B.

Multiple source data fusion via sparse representation for robust visual tracking. In: Proceedings of the 14th International Conference on Information Fusion, 1–8, 2011.

[58]

Wang,

Y. L.

; Li,

C. L.

; Tang,

; Sun,

D. D.

Learning collaborative sparse correlation filter for real-time multispectral object tracking. In: Advances in Brain Inspired Cognitive Systems. Lecture Notes in Computer Science, Vol. 10989. Ren,

, et al. Eds. Springer Cham, 462–472, 2018.

Crossref

[59]

Lan,

X. Y.

; Ye,

; Shao,

; Zhong,

B. N.

; Jain,

D. K.

; Zhou,

H. Y.

Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access Vol. 7, 67761–67771, 2019.

Crossref Google Scholar

[60]

Lan,

X. Y.

; Ye,

; Zhang,

S. P.

; Zhou,

H. Y.

; Yuen,

P. C.

Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters Vol. 130, 12–20, 2020.

Crossref Google Scholar

[61]

Lan,

X. Y.

; Ye,

; Shao,

; Zhong,

B. N.

; Yuen,

P. C.

; Zhou,

H. Y.

Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Transactions on Industrial Electronics Vol. 66, No. 12, 9887–9897, 2019.

Crossref Google Scholar

[62]

Li,

C. L.

; Zhu,

C. L.

; Huang,

; Tang,

; Wang,

Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari,

; Hebert,

; Sminchisescu,

; Weiss,

Eds. Springer Cham, 831–847, 2018.

Crossref

[63]

Li,

C. L.

; Hu,

S. Y.

; Gao,

S. H.

; Tang,

Real-time grayscale-thermal tracking via Laplacian sparse representation. In: MultiMedia Modeling. Lecture Notes in Computer Science, Vol. 9517. Tian,

; Sebe,

; Qi,

G. J.

; Huet,

; Hong,

; Liu,

Eds. Springer Cham, 54–65, 2016.

Crossref

[64]

Li,

C. L.

; Zhu,

C. L.

; Zhang,

; Luo,

; Wu,

X. H.

; Tang,

Learning local-global multi-graph descriptors for RGB-T object tracking. IEEE Transactions on Circuits and Systems for Video Technology Vol. 29, No. 10, 2913–2926, 2019.

Crossref Google Scholar

[65]

Zhang,

; Zhang,

; Zhuo,

; Zhang,

Object tracking in RGB-T videos using modal-awareattention network and competitive learning. Sensors Vol. 20, No. 2, 393, 2020.

Crossref Google Scholar

[66]

Li,

C. L.

; Sun,

; Wang,

; Zhang,

; Tang,

Grayscale-thermal object tracking via multitask Laplacian sparse representation. IEEE Transactions on Systems, Man, and Cybernetics: Systems Vol. 47, No. 4, 673–681, 2017.

Crossref Google Scholar

[67]

Liu,

H. P.

; Sun,

F. C.

Fusion tracking in color and infrared images using joint sparse representation. Science China Information Sciences Vol. 55, No. 3, 590–599, 2012.

Crossref Google Scholar

[68]

Zhai,

S. L.

; Shao,

P. P.

; Liang,

X. Y.

; Wang,

Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing Vol. 334, 172–181, 2019.

Crossref Google Scholar

[69]

Gao,

; Li,

C. L.

; Zhu,

Y. B.

; Tang,

; He,

; Wang,

F. T.

Deep adaptive fusion network for high performance RGBT tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 91–99, 2019.

Crossref

[70]

Zhu,

Y. B.

; Li,

C. L.

; Luo,

; Tang,

; Wang,

Dense feature aggregation and pruning for RGBT tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, 465–472, 2019.

Crossref

[71]

Li,

C. L.

; Wu,

X. H.

; Zhao,

; Cao,

X. C.

; Tang,

Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing Vol. 281, 78–85, 2018.

Crossref Google Scholar

[72]

Yang,

; Zhu,

Y. B.

; Wang,

; Li,

C. L.

; Tang,

Learning target-oriented dual attention for robust RGB-T tracking. In: Proceedings of the IEEE International Conference on Image Processing, 3975–3979, 2019.

Crossref

[73]

Zhang,

X. M.

; Zhang,

X. H.

; Du,

X. D.

; Zhou,

X. M.

; Yin,

Learning multi-domain convolutional network for RGB-T visual tracking. In: Proceedings of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 1–6, 2018.

Crossref

[74]

Zhang,

L. C.

; Danelljan,

; Gonzalez-Garcia,

; van de Weijer,

; Shahbaz Khan,

Multi-modal fusion for end-to-end RGB-T tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2252–2261, 2019.

Crossref

[75]

Zhu,

Y. B.

; Li,

C. L.

; Luo,

; Tang,

FANet: Quality-aware feature aggregation network for robust RGB-T tracking. arXiv preprint arXiv:1811.09855, 2018.

Google Scholar

[76]

Conaire,

C. O.

; O’Connor,

N. E.

; Cooke,

; Smeaton,

A. F.

Comparison of fusion methods for thermo-visual surveillance tracking. In: Proceedings of the 9th International Conference on Information Fusion, 1–7, 2006.

Crossref

[77]

Shi,

H. Z.

; Gao,

C. X.

; Sang,

Using consistency of depth gradient to improve visual tracking in RGB-D sequences. In: Proceedings of the Chinese Automation Congress, 518–522, 2015.

[78]

Wang,

; Fang,

J. W.

; Yuan,

Multi-cue based tracking. Neurocomputing Vol. 131, 227–236, 2014.

Crossref Google Scholar

[79]

Zhang,

; Cai,

; Li,

J. X.

A real-time RGB-D tracker based on KCF. In: Proceedings of the Chinese Control and Decision Conference, 4856–4861, 2018.

Crossref

[80]

Conaire,

C. Ó.

; O’Connor,

N. E.

; Smeaton,

Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Machine Vision and Applications Vol. 19, Nos. 5–6, 483–494, 2008.

Crossref Google Scholar

[81]

Luo,

C. W.

; Sun,

; Yang,

; Lu,

T. R.

; Yeh,

W. C.

Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics & Technology Vol. 99, 265–276, 2019.

Crossref Google Scholar

[82]

Zhai,

Y. Y.

; Song,

; Mou,

Z. L.

; Chen,

X. X.

; Liu,

X. J.

Occlusion-aware correlation particle filter target tracking based on RGBD data. IEEE Access Vol. 6, 50752–50764, 2018.

Crossref Google Scholar

[83]

Li,

G. Q.

; Huang,

; Zhang,

P. C.

; Li,

; Huo,

Y. K.

Depth information aided constrained correlation filter for visual tracking. IOP Conference Series: Earth and Environmental Science Vol. 234, 012005, 2019.

Crossref Google Scholar

[84]

Wang,

C. Q.

; Xu,

C. Y.

; Cui,

; Zhou,

; Zhang,

X. Y.

; Yang,

Cross-modal pattern-propagation for RGB-T tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7062–7071, 2020.

Crossref

[85]

Zhu,

X. F.

; Xu,

T. Y.

; Tang,

Z. Y.

; Wu,

Z. C.

; Liu,

H. D.

; Yang,

; Wu,

X. J.

; Kittler,

RGBD1K: A large-scale dataset and benchmark for RGB-D object tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 3, 3870–3878, 2023.

Crossref Google Scholar

[86]

Feng,

M. Z.

; Su,

J. B.

Learning reliable modal weight with transformer for robust RGBT tracking. Knowledge-Based Systems Vol. 249, 108945, 2022.

Crossref Google Scholar

[87]

Xiao,

; Yang,

M. M.

; Li,

C. L.

; Liu,

; Tang,

Attribute-based progressive fusion network for RGBT tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 3, 2831–2838, 2022.

Crossref Google Scholar

[88]

Zhang,

P. Y.

; Zhao,

; Wang,

; Lu,

H. C.

; Ruan,

Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8876–8885, 2022.

Crossref

[89]

Chen,

; Shen,

Y. J.

; Liu,

; Zhong,

B. N.

3D object tracking via image sets and depth-based occlusion detection. Signal Processing Vol. 112, 146–153, 2015.

Crossref Google Scholar

[90]

Li,

C. L.

; Zhu,

C. L.

; Zheng,

S. F.

; Luo,

; Tang,

Two-stage modality-graphs regularized manifold ranking for RGB-T tracking. Signal Processing: Image Communication Vol. 68, 207–217, 2018.

Crossref Google Scholar

[91]

Geiger,

; Lenz,

; Stiller,

; Urtasun,

Vision meets robotics: The KITTI dataset. International Journal of Robotics Research Vol. 32, No. 11, 1231–1237, 2013.

Crossref Google Scholar

[92]

Liu,

; Liu,

; Cui,

; Chen,

Y. Q.

Real-time human detection and tracking in complex environments using single RGBD camera. In: Proceedings of the IEEE International Conference on Image Processing, 3088–3092, 2013.

Crossref

[93]

Song,

S. R.

; Xiao,

J. X.

Tracking revisited using RGBD camera: Unified benchmark and baselines. In: Proceedings of the IEEE International Conference on Computer Vision, 233–240, 2013.

Crossref

[94]

Lukezic,

; Kart,

; Kapyla,

; Durmush,

; Kamarainen,

J. K.

; Matas,

; Kristan,

CDTB: A color and depth visual object tracking dataset and benchmark. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10021, 2019.

Crossref

[95]

Davis,

J. W.

; Sharma,

Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding Vol. 106, Nos. 2–3, 162–182, 2007.

Crossref Google Scholar

[96]

Torabi,

; Massé,

; Bilodeau,

G. A.

An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Computer Vision and Image Understanding Vol. 116, No. 2, 210–221, 2012.

Crossref Google Scholar

[97]

Li,

C. L.

; Xue,

W. L.

; Jia,

Y. Q.

; Qu,

Z. C.

; Luo,

; Tang,

; Sun,

D. D.

LasHeR: A large-scale high-diversity benchmark for RGBT tracking. IEEE Transactions on Image Processing Vol. 31, 392–404, 2022.

Crossref Google Scholar

[98]

Ramachandram,

; Taylor,

G. W.

Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine Vol. 34, No. 6, 96–108, 2017.

Crossref Google Scholar

[99]

Li,

; Zhu,

J. K.

A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision - ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8926. Agapito,

; Bronstein,

; Rother,

Eds. Springer Cham, 254–265, 2015.

Crossref

[100]

Danelljan,

; Hager,

; Khan,

F. S.

; Felsberg,

Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561–1575, 2017.

Crossref Google Scholar

[101]

Li,

; Hu,

W. M.

; Shen,

C. H.

; Zhang,

Z. F.

; Dick,

; Van Den Hengel,

A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology Vol. 4, No. 4, Article No. 58, 2013.

Crossref Google Scholar

[102]

Shojaeilangari,

; Yau,

W. Y.

; Nandakumar,

; Li,

; Teoh,

E. K.

Robust representation and recognition of facial emotions using extreme sparse learning. IEEE Transactions on Image Processing Vol. 24, No. 7, 2140–2152, 2015.

Crossref Google Scholar

[103]

Yang,

; Zhang,

; Feng,

X. C.

; Zhang,

Sparse representation based fisher discrimination dictionary learning for image classification. International Journal of Computer Vision Vol. 109, No. 3, 209–232, 2014.

Crossref Google Scholar

[104]

Xie,

; Zhang,

W. S.

; Li,

C. H.

; Lin,

S. Y.

; Qu,

Y. Y.

; Zhang,

Y. H.

Discriminative object tracking via sparse representation and online dictionary learning. IEEE Transactions on Cybernetics Vol. 44, No. 4, 539–553, 2014.

Crossref Google Scholar

[105]

Isard,

; Blake,

CONDENSATION—Conditional density propagation for visual tracking. International Journal of Computer Vision Vol. 29, No. 1, 5–28, 1998.

Crossref Google Scholar

[106]

Danelljan,

; Hager,

; Khan,

F. S.

; Felsberg,

Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.

Crossref

[107]

Li,

; Tian,

; Zuo,

W. M.

; Zhang,

; Yang,

M. H.

Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4904–4913, 2018.

Crossref

[108]

Danelljan,

; Bhat,

; Khan,

F. S.

; Felsberg,

ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931–6939, 2017.

Crossref

[109]

Lukežič,

; Vojíř,

; Čehovin Zajc,

; Matas,

; Kristan,

Discriminative correlation filter tracker with channel and spatial reliability. International Journal of Computer Vision Vol. 126, No. 7, 671–688, 2018.

Crossref Google Scholar

[110]

Simonyan,

; Zisserman,

Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representation, 2015.

[111]

Nam,

; Han,

Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293–4302, 2016.

Crossref

[112]

He,

K. M.

; Zhang,

X. Y.

; Ren,

S. Q.

; Sun,

Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

Crossref

[113]

Li,

C. L.

; Liu,

; Lu,

A. D.

; Ji,

; Tang,

Challenge-aware RGBT tracking. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12367. Vedaldi,

; Bischof,

; Brox,

; Frahm,

J. M.

Eds. Springer Cham, 222–237, 2020.

Crossref

[114]

Chen,

; Yan,

; Zhu,

J. W.

; Wang,

; Yang,

X. Y.

; Lu,

H. C.

Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8122–8131, 2021.

Crossref

[115]

Yan,

; Peng,

H. W.

; Fu,

J. L.

; Wang,

; Lu,

H. C.

Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10428–10437, 2021.

Crossref

[116]

Hare,

; Saffari,

; Torr,

P. H. S.

Struck: Structured output tracking with kernels. In: Proceedings of the International Conference on Computer Vision, 263–270, 2011.

Crossref

[117]

Lukežič,

; Zajc,

L. Č

; Vojíř,

; Matas,

; Kristan,

Now you see me: Evaluating performance in long-term visual tracking. arXiv preprint arXiv:1804.07056, 2018.

Google Scholar

[118]

Wu,

; Lim,

; Yang,

M. H.

Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411–2418, 2013.

Crossref

[119]

Wu,

; Lim,

; Yang,

M. H.

Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834–1848, 2015.

Crossref Google Scholar

[120]

Zhang,

L. C.

; Gonzalez-Garcia,

; van de Weijer,

; Danelljan,

; Khan,

F. S.

Synthetic data generation for end-to-end thermal infrared tracking. IEEE Transactions on Image Processing Vol. 28, No. 4, 1837–1850, 2019.

Crossref Google Scholar

[121]

Bendjebbour,

; Delignon,

; Fouque,

; Samson,

; Pieczynski,

Multisensor image segmentation using Dempster–Shafer fusion in Markov fields context. IEEE Transactions on Geoscience and Remote Sensing Vol. 39, No. 8, 1789–1798, 2001.

Crossref Google Scholar

[122]

Xu,

H. X.

; Chua,

T. S.

Fusion of AV features and external information sources for event detection in team sports video. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 2, No. 1, 44–67, 2006.

Crossref Google Scholar

[123]

Liu,

H. X.

; Simonyan,

; Yang,

Y. M.

DARTS: Differentiable architecture search. In: Proceedings of the International Conference on Learning Representations, 2019.

[124]

Zoph,

; Vasudevan,

; Shlens,

; Le,

Q. V.

Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697–8710, 2018.

Crossref

[125]

Huang,

; Lucey,

; Ramanan,

Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE International Conference on Computer Vision, 105–114, 2017.

Crossref

[126]

Guo,

; Feng,

; Gao,

R. J.

; Liu,

; Wang,

Exploring the effects of blur and deblurring to visual object tracking. IEEE Transactions on Image Processing Vol. 30, 1812–1824, 2021.

Crossref Google Scholar

[127]

Guo,

; Cheng,

Z. Y.

; Juefei-Xu,

; Ma,

; Xie,

X. F.

; Liu,

; Zhao,

J. J.

Learning to adversarially blur visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10819–10828, 2021.

Crossref

Computational Visual Media

Volume 10 Issue 2,
April 2024

Pages 193-214

DOI: 10.1007/s41095-023-0345-5

Cite this article:

Zhang P, Wang D, Lu H. Multi-modal visual tracking: Review and experimental comparison. Computational Visual Media, 2024, 10(2): 193-214. https://doi.org/10.1007/s41095-023-0345-5