AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Review Article | Open Access

Multi-modal visual tracking: Review and experimental comparison

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China
Show Author Information

Graphical Abstract

Abstract

Visual object tracking has been drawing increasing attention in recent years, as a fundamental task in computer vision. To extend the range of tracking applications, researchers have been introducing information from multiple modalities to handle specific scenes, with promising research prospects for emerging methods and benchmarks. To provide a thorough review of multi-modal tracking, different aspects of multi-modal tracking algorithms are summarized under a unified taxonomy, with specific focus on visible-depth (RGB-D) and visible-thermal (RGB-T) tracking. Subsequently, a detailed description of the related benchmarks and challenges is provided. Extensive experiments were conducted to analyze the effectiveness of trackers on five datasets: PTB, VOT19-RGBD, GTOT, RGBT234, and VOT19-RGBT. Finally, various future directions, including model design and dataset construction, are discussed from different perspectives for further research.

Electronic Supplementary Material

Download File(s)
41095_0345_ESM.pdf (291.9 KB)

References

[1]
Li, C. L.; Cheng, H.; Hu, S. Y.; Liu, X. B.; Tang, J.; Lin, L. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing Vol. 25, No. 12, 5743–5756, 2016.
[2]
Li, C. L.; Zhao, N.; Lu, Y. J.; Zhu, C. L.; Tang, J. Weighted sparse representation regularized graph learning for RGB-T object tracking. In: Proceedings of the 25th ACM International Conference on Multimedia, 1856–1864, 2017.
[3]
Li, C. L.; Liang, X. Y.; Lu, Y. J.; Zhao, N.; Tang, J. RGB-T object tracking: Benchmark and baseline. Pattern Recognition Vol. 96, 106977, 2019.
[4]
Xiao, J. J.; Stolkin, R.; Gao, Y. Q.; Leonardis, A. Robust fusion of color and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints. IEEE Transactions on Cybernetics Vol. 48, No. 8, 2485–2499, 2018.
[5]
Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Kämäräinen, J. K.; Danelljan, M.;, Lukežič, A.; Drbohlav, O.; He, L.; et al. The eighth visual object tracking VOT2020 challenge results. In: Computer Vision – ECCV 2020 Workshops Lecture Notes in Computer Science, Vol. 12539. Bartoli, A.; Fusiello, A. Eds. Springer Cham, 547–601, 2020.
[6]
Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Cehovin, L.; Fernandez, G.; Vojir, T.; Hager, G.; Nebehay, G.; Pflugfelder, R. The visual object tracking VOT2015 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshop, 564–586, 2015.
[7]
Kart, U.; Lukežič, A.; Kristan, M.; Kämäräinen, J.-K.; Matas, J. Object tracking by reconstruction with view-specific discriminative correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1339–1348, 2019.
[8]
Cvejic, N.; Nikolov, S. G.; Knowles, H. D.; Loza, A.; Achim, A.; Bull, D. R.; Canagarajah, C. N. The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–7, 2007.
[9]
Kart, U.; Kämäräinen, J. K.; Matas, J. How to make an RGBD tracker? In: Computer Vision – ECCV 2018 Workshops. Lecture Notes in Computer Science, Vol. 11129. Leal-Taixé, L.; Roth, S. Eds. Springer Cham, 148–161, 2019.
[10]
Zhang, P. Y.; Zhao, J.; Bo, C. J.; Wang, D.; Lu, H. C.; Yang, X. Y. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing Vol. 30, 3335–3347, 2021.
[11]
Li, C. L.; Lu, A. D.; Zheng, A. H.; Tu, Z. Z.; Tang, J. Multi-adapter RGBT tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2262–2270, 2019.
[12]
Weng, S. K.; Kuo, C. M.; Tu, S. K. Video object tracking using adaptive Kalman filter. Journal of Visual Communication and Image Representation Vol. 17, No. 6, 1190–1208, 2006.
[13]
Kulikov, G. Y.; Kulikova, M. V. The accurate continuous-discrete extended Kalman filter for radar tracking. IEEE Transactions on Signal Processing Vol. 64, No. 4, 948–958, 2016.
[14]
Yang, C. J.; Duraiswami, R.; Davis, L. Fast multiple object tracking via a hierarchical particle filter. In: Proceedings of the 10th IEEE International Conference on Computer Vision, Vol. 1, 212–219, 2005.
[15]
Okuma, K.; Taleghani, A.; de Freitas, N.; Little, J. J.; Lowe, D. G. A boosted particle filter: Multitarget detection and tracking. In: Computer Vision - ECCV 2004. Lecture Notes in Computer Science, Vol. 3021. Pajdla, T.; Matas, J. Eds. Springer Berlin Heidelberg, 28–39, 2004.
[16]
Zhang, T. Z.; Ghanem, B.; Liu, S.; Ahuja, N. Low-rank sparse learning for robust visual tracking. In: Computer Vision – ECCV 2012. Lecture Notes in Computer Science, Vol. 7577. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 470–484, 2012.
[17]
Zhang, T. Z.; Ghanem, B.; Liu, S.; Ahuja, N. Robust visual tracking via multi-task sparse learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2042–2049, 2012.
[18]
Bolme, D.; Beveridge, J. R.; Draper, B. A.; Lui, Y. M. Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2544–2550, 2010.
[19]
Li, Y.; Zhu, J. K. A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision - ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8926. Agapito, L.; Bronstein, M.; Rother, C. Eds. Springer Cham, 254–265, 2015.
[20]
Bertinetto, L.; Valmadre, J.; Henriques, J. F.; Vedaldi, A.; Torr, P. H. S. Fully-convolutional Siamese networks for object tracking. In: Computer Vision – ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 850–865, 2016.
[21]
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.
[22]
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M.Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.
[23]
Galoogahi, H. K.; Fagg, A.; Lucey, S. Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1144–1152, 2017.
[24]
Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W. M.; Torr, P. H. S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328–1338, 2019.
[25]
Lukezic, A.; Matas, J.; Kristan, M. D3S—A discriminative single shot segmentation tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7131–7140, 2020.
[26]
Song, X.; Zhao, H. J.; Cui, J. S.; Shao, X. W.; Shibasaki, R.; Zha, H. B. An online system for multiple interacting targets tracking. ACM Transactions on Intelligent Systems and Technology Vol. 4, No. 1, Article No. 18, 2013.
[27]
Kim, D. Y.; Jeon, M. Data fusion of radar and image measurements for multi-object tracking via Kalman filtering. Information Sciences Vol. 278, 641–652, 2014.
[28]
Megherbi, N.; Ambellouis, S.; Colot, O.; Cabestaing, F. Joint audio-video people tracking using belief theory. In: Proceedings of the IEEE Conference on Advanced Video and Signal based Surveillance, 135–140, 2005.
[29]
Zhang, P. Y.; Wang, D.; Lu, H. C.; Yang, X. Y. Learning adaptive attribute-driven representation for real-time RGB-T tracking. International Journal of Computer Vision Vol. 129, No. 9, 2714–2729, 2021.
[30]
Yan, S.; Yang, J. Y.; Kapyla, J.; Zheng, F.; Leonardis, A.; Kamarainen, J. K. DepthTrack: Unveiling the power of RGBD tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10705–10713, 2021.
[31]
Gao, S.; Yang, J. Y.; Li, Z.; Zheng, F.; Leonardis, A.; Song, J. K. Learning dual-fused modality-awarerepresentations for RGBD tracking. In: Computer Vision – ECCV 2022 Workshops. Lecture Notes in Computer Science, Vol. 13808. Karlinsky, L.; Michaeli, T.; Nishino, K. Eds. Springer Cham, 478–494, 2023.
[32]
Lan, X. Y.; Ye, M.; Zhang, S. P.; Yuen, P. Robust collaborative discriminative learning for RGB-infrared tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, No. 1, 7008–7015, 2018.
[33]
Liu, Y.; Jing, X. Y.; Nie, J. H.; Gao, H.; Liu, J.; Jiang, G. P. Context-aware three-dimensional mean-shift with occlusion handling for robust object tracking in RGB-D videos. IEEE Transactions on Multimedia Vol. 21, No. 3, 664–677, 2019.
[34]
Atrey, P. K.; Hossain, M. A.; El Saddik, A.; Kankanhalli, M. S. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems Vol. 16, No. 6, 345–379, 2010.
[35]
Walia, G. S.; Kapoor, R. Recent advances on multicue object tracking: A survey. Artificial Intelligence Review Vol. 46, No. 1, 1–39, 2016.
[36]
Cai, Z. Y.; Han, J. G.; Liu, L.; Shao, L. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools and Applications Vol. 76, No. 3, 4313–4355, 2017.
[37]
Camplani, M.; Paiement, A.; Mirmehdi, M.; Damen, D. M.; Hannuna, S.; Burghardt, T.; Tao, L. L. Multiple human tracking in RGB-depth data: A survey. IET Computer Vision Vol. 11, No. 4, 265–285, 2017.
[38]
Baltrusaitis, T.; Ahuja, C.; Morency, L. P. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 2, 423–443, 2019.
[39]
Ma, J. Y.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Information Fusion Vol. 45, 153–178, 2019.
[40]
Zhang, X. C.; Ye, P.; Leung, H.; Gong, K.; Xiao, G. Object fusion tracking based on visible and infrared images: A comprehensive review. Information Fusion Vol. 63, 166–187, 2020.
[41]
Bibi, A.; Zhang, T. Z.; Ghanem, B. 3D part-based sparse tracker with automatic synchronization and registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1439–1448, 2016.
[42]
Gutev, A.; Debono, C. J. Exploiting depth information to increase object tracking robustness. In: Proceedings of the IEEE EUROCON 18th International Conference on Smart Technologies, 1–5, 2019.
[43]
Xie, Y. J.; Lu, Y.; Gu, S. RGB-D object tracking with occlusion detection. In: Proceedings of the15th International Conference on Computational Intelligence and Security, 11–15, 2019.
[44]
Zhong, B. N.; Shen, Y. J.; Chen, Y.; Xie, W. B.; Cui, Z.; Zhang, H. B.; Chen, D. S.; Wang, T.; Liu, X.; Peng, S. J.; et al. Online learning 3D context for robust visual tracking. Neurocomputing Vol. 151, 710–718, 2015.
[45]
An, N.; Zhao, X. G.; Hou, Z. G. Online RGB-D tracking via detection-learning-segmentation. In: Proceedings of the 23rd International Conference on Pattern Recognition, 1231–1236, 2016.
[46]
Camplani, M.; Hannuna, S.; Mirmehdi, M.; Damen, D. M.; Paiement, A.; Tao, L. L.; Burghardt, T. Real-time RGB-D tracking with depth scaling kernelised correlation filters and occlusion handling. In: Proceedings of the British Machine Vision Conference, 145.1–145.11, 2015.
[47]
Ding, P.; Song, Y. Robust object tracking using color and depth images with a depth based occlusion handling and recovery. In: Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery, 930–935, 2015.
[48]
García, G. M.; Klein, D. A.; Stückler, J.; Frintrop, S.; Cremers, A. B. Adaptive multi-cue 3D tracking of arbitrary objects. In: Pattern Recognition. Lecture Notes in Computer Science, Vol. 7476. Pinz, A.; Pock, T.; Bischof, H.; Leberl, F. Eds. Springer Berlin Heidelberg, 357–366, 2012.
[49]
Hannuna, S.; Camplani, M.; Hall, J.; Mirmehdi, M.; Damen, D. M.; Burghardt, T.; Paiement, A.; Tao, L. L. DS-KCF: A real-time tracker for RGB-D data. Journal of Real-Time Image Processing Vol. 16, No. 5, 1439–1458, 2019.
[50]
Kart, U.; Kamarainen, J. K.; Matas, J.; Fan, L. X.; Cricri, F. Depth masked discriminative correlation filter. In: Proceedings of the 24th International Conference on Pattern Recognition, 2112–2117, 2018.
[51]
Kuai, Y. L.; Wen, G. J.; Li, D. D.; Xiao, J. J. Target-aware correlation filter tracking in RGBD videos. IEEE Sensors Journal Vol. 19, No. 20, 9522–9531, 2019.
[52]
Leng, J. X.; Liu, Y. Real-time RGB-D visual tracking with scale estimation and occlusion handling. IEEE Access Vol. 6, 24256–24263, 2018.
[53]
Liu, W. C.; Tang, X. A.; Zhao, C. L. Robust RGBD tracking via weighted convolution operators. IEEE Sensors Journal Vol. 20, No. 8, 4496–4503, 2020.
[54]
Ma, Z. A.; Xiang, Z. Y. Robust object tracking with RGBD-based sparse learning. Frontiers of Information Technology & Electronic Engineering Vol. 18, No. 7, 989–1001, 2017.
[55]
Meshgi, K.; Maeda, S. I.; Oba, S.; Skibbe, H.; Li, Y. Z.; Ishii, S. An occlusion-aware particle filter tracker tohandle complex and persistent occlusions. Computer Vision and Image Understanding Vol. 150, 81–94, 2016.
[56]
Chen, S. Y.; Zhu, W. J.; Leung, H. Thermo-visual video fusion using probabilistic graphical model for human tracking. In: Proceedings of the IEEE International Symposium on Circuits and Systems, 1926–1929, 2008.
[57]
Wu, Y.; Blasch, E.; Chen, G. S.; Bai, L.; Ling, H. B. Multiple source data fusion via sparse representation for robust visual tracking. In: Proceedings of the 14th International Conference on Information Fusion, 1–8, 2011.
[58]
Wang, Y. L.; Li, C. L.; Tang, J.; Sun, D. D. Learning collaborative sparse correlation filter for real-time multispectral object tracking. In: Advances in Brain Inspired Cognitive Systems. Lecture Notes in Computer Science, Vol. 10989. Ren, J., et al. Eds. Springer Cham, 462–472, 2018.
[59]
Lan, X. Y.; Ye, M.; Shao, R.; Zhong, B. N.; Jain, D. K.; Zhou, H. Y. Online non-negative multi-modality feature template learning for RGB-assisted infrared tracking. IEEE Access Vol. 7, 67761–67771, 2019.
[60]
Lan, X. Y.; Ye, M.; Zhang, S. P.; Zhou, H. Y.; Yuen, P. C. Modality-correlation-aware sparse representation for RGB-infrared object tracking. Pattern Recognition Letters Vol. 130, 12–20, 2020.
[61]
Lan, X. Y.; Ye, M.; Shao, R.; Zhong, B. N.; Yuen, P. C.; Zhou, H. Y. Learning modality-consistency feature templates: A robust RGB-infrared tracking system. IEEE Transactions on Industrial Electronics Vol. 66, No. 12, 9887–9897, 2019.
[62]
Li, C. L.; Zhu, C. L.; Huang, Y.; Tang, J.; Wang, L. Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11217. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 831–847, 2018.
[63]
Li, C. L.; Hu, S. Y.; Gao, S. H.; Tang, J. Real-time grayscale-thermal tracking via Laplacian sparse representation. In: MultiMedia Modeling. Lecture Notes in Computer Science, Vol. 9517. Tian, Q.; Sebe, N.; Qi, G. J.; Huet, B.; Hong, R.; Liu, X. Eds. Springer Cham, 54–65, 2016.
[64]
Li, C. L.; Zhu, C. L.; Zhang, J.; Luo, B.; Wu, X. H.; Tang, J. Learning local-global multi-graph descriptors for RGB-T object tracking. IEEE Transactions on Circuits and Systems for Video Technology Vol. 29, No. 10, 2913–2926, 2019.
[65]
Zhang, H.; Zhang, L.; Zhuo, L.; Zhang, J. Object tracking in RGB-T videos using modal-awareattention network and competitive learning. Sensors Vol. 20, No. 2, 393, 2020.
[66]
Li, C. L.; Sun, X.; Wang, X.; Zhang, L.; Tang, J. Grayscale-thermal object tracking via multitask Laplacian sparse representation. IEEE Transactions on Systems, Man, and Cybernetics: Systems Vol. 47, No. 4, 673–681, 2017.
[67]
Liu, H. P.; Sun, F. C. Fusion tracking in color and infrared images using joint sparse representation. Science China Information Sciences Vol. 55, No. 3, 590–599, 2012.
[68]
Zhai, S. L.; Shao, P. P.; Liang, X. Y.; Wang, X. Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing Vol. 334, 172–181, 2019.
[69]
Gao, Y.; Li, C. L.; Zhu, Y. B.; Tang, J.; He, T.; Wang, F. T. Deep adaptive fusion network for high performance RGBT tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 91–99, 2019.
[70]
Zhu, Y. B.; Li, C. L.; Luo, B.; Tang, J.; Wang, X. Dense feature aggregation and pruning for RGBT tracking. In: Proceedings of the 27th ACM International Conference on Multimedia, 465–472, 2019.
[71]
Li, C. L.; Wu, X. H.; Zhao, N.; Cao, X. C.; Tang, J. Fusing two-stream convolutional neural networks for RGB-T object tracking. Neurocomputing Vol. 281, 78–85, 2018.
[72]
Yang, R.; Zhu, Y. B.; Wang, X.; Li, C. L.; Tang, J. Learning target-oriented dual attention for robust RGB-T tracking. In: Proceedings of the IEEE International Conference on Image Processing, 3975–3979, 2019.
[73]
Zhang, X. M.; Zhang, X. H.; Du, X. D.; Zhou, X. M.; Yin, J. Learning multi-domain convolutional network for RGB-T visual tracking. In: Proceedings of the 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 1–6, 2018.
[74]
Zhang, L. C.; Danelljan, M.; Gonzalez-Garcia, A.; van de Weijer, J.; Shahbaz Khan, F. Multi-modal fusion for end-to-end RGB-T tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, 2252–2261, 2019.
[75]
Zhu, Y. B.; Li, C. L.; Luo, B.; Tang, J. FANet: Quality-aware feature aggregation network for robust RGB-T tracking. arXiv preprint arXiv:1811.09855, 2018.
[76]
Conaire, C. O.; O’Connor, N. E.; Cooke, E.; Smeaton, A. F. Comparison of fusion methods for thermo-visual surveillance tracking. In: Proceedings of the 9th International Conference on Information Fusion, 1–7, 2006.
[77]
Shi, H. Z.; Gao, C. X.; Sang, N. Using consistency of depth gradient to improve visual tracking in RGB-D sequences. In: Proceedings of the Chinese Automation Congress, 518–522, 2015.
[78]
Wang, Q.; Fang, J. W.; Yuan, Y. Multi-cue based tracking. Neurocomputing Vol. 131, 227–236, 2014.
[79]
Zhang, H.; Cai, M.; Li, J. X. A real-time RGB-D tracker based on KCF. In: Proceedings of the Chinese Control and Decision Conference, 4856–4861, 2018.
[80]
Conaire, C. Ó.; O’Connor, N. E.; Smeaton, A. Thermo-visual feature fusion for object tracking using multiple spatiogram trackers. Machine Vision and Applications Vol. 19, Nos. 5–6, 483–494, 2008.
[81]
Luo, C. W.; Sun, B.; Yang, K.; Lu, T. R.; Yeh, W. C. Thermal infrared and visible sequences fusion tracking based on a hybrid tracking framework with adaptive weighting scheme. Infrared Physics & Technology Vol. 99, 265–276, 2019.
[82]
Zhai, Y. Y.; Song, P.; Mou, Z. L.; Chen, X. X.; Liu, X. J. Occlusion-aware correlation particle filter target tracking based on RGBD data. IEEE Access Vol. 6, 50752–50764, 2018.
[83]
Li, G. Q.; Huang, L.; Zhang, P. C.; Li, Q.; Huo, Y. K. Depth information aided constrained correlation filter for visual tracking. IOP Conference Series: Earth and Environmental Science Vol. 234, 012005, 2019.
[84]
Wang, C. Q.; Xu, C. Y.; Cui, Z.; Zhou, L.; Zhang, T.; Zhang, X. Y.; Yang, J. Cross-modal pattern-propagation for RGB-T tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7062–7071, 2020.
[85]
Zhu, X. F.; Xu, T. Y.; Tang, Z. Y.; Wu, Z. C.; Liu, H. D.; Yang, X.; Wu, X. J.; Kittler, J. RGBD1K: A large-scale dataset and benchmark for RGB-D object tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 3, 3870–3878, 2023.
[86]
Feng, M. Z.; Su, J. B. Learning reliable modal weight with transformer for robust RGBT tracking. Knowledge-Based Systems Vol. 249, 108945, 2022.
[87]
Xiao, Y.; Yang, M. M.; Li, C. L.; Liu, L.; Tang, J. Attribute-based progressive fusion network for RGBT tracking. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 3, 2831–2838, 2022.
[88]
Zhang, P. Y.; Zhao, J.; Wang, D.; Lu, H. C.; Ruan, X. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8876–8885, 2022.
[89]
Chen, Y.; Shen, Y. J.; Liu, X.; Zhong, B. N. 3D object tracking via image sets and depth-based occlusion detection. Signal Processing Vol. 112, 146–153, 2015.
[90]
Li, C. L.; Zhu, C. L.; Zheng, S. F.; Luo, B.; Tang, J. Two-stage modality-graphs regularized manifold ranking for RGB-T tracking. Signal Processing: Image Communication Vol. 68, 207–217, 2018.
[91]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research Vol. 32, No. 11, 1231–1237, 2013.
[92]
Liu, J.; Liu, Y.; Cui, Y.; Chen, Y. Q. Real-time human detection and tracking in complex environments using single RGBD camera. In: Proceedings of the IEEE International Conference on Image Processing, 3088–3092, 2013.
[93]
Song, S. R.; Xiao, J. X. Tracking revisited using RGBD camera: Unified benchmark and baselines. In: Proceedings of the IEEE International Conference on Computer Vision, 233–240, 2013.
[94]
Lukezic, A.; Kart, U.; Kapyla, J.; Durmush, A.; Kamarainen, J. K.; Matas, J.; Kristan, M. CDTB: A color and depth visual object tracking dataset and benchmark. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10021, 2019.
[95]
Davis, J. W.; Sharma, V. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding Vol. 106, Nos. 2–3, 162–182, 2007.
[96]
Torabi, A.; Massé, G.; Bilodeau, G. A. An iterative integrated framework for thermal-visible image registration, sensor fusion, and people tracking for video surveillance applications. Computer Vision and Image Understanding Vol. 116, No. 2, 210–221, 2012.
[97]
Li, C. L.; Xue, W. L.; Jia, Y. Q.; Qu, Z. C.; Luo, B.; Tang, J.; Sun, D. D. LasHeR: A large-scale high-diversity benchmark for RGBT tracking. IEEE Transactions on Image Processing Vol. 31, 392–404, 2022.
[98]
Ramachandram, D.; Taylor, G. W. Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine Vol. 34, No. 6, 96–108, 2017.
[99]
Li, Y.; Zhu, J. K. A scale adaptive kernel correlation filter tracker with feature integration. In: Computer Vision - ECCV 2014 Workshops. Lecture Notes in Computer Science, Vol. 8926. Agapito, L.; Bronstein, M.; Rother, C. Eds. Springer Cham, 254–265, 2015.
[100]
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561–1575, 2017.
[101]
Li, X.; Hu, W. M.; Shen, C. H.; Zhang, Z. F.; Dick, A.; Van Den Hengel, A. A survey of appearance models in visual object tracking. ACM Transactions on Intelligent Systems and Technology Vol. 4, No. 4, Article No. 58, 2013.
[102]
Shojaeilangari, S.; Yau, W. Y.; Nandakumar, K.; Li, J.; Teoh, E. K. Robust representation and recognition of facial emotions using extreme sparse learning. IEEE Transactions on Image Processing Vol. 24, No. 7, 2140–2152, 2015.
[103]
Yang, M.; Zhang, L.; Feng, X. C.; Zhang, D. Sparse representation based fisher discrimination dictionary learning for image classification. International Journal of Computer Vision Vol. 109, No. 3, 209–232, 2014.
[104]
Xie, Y.; Zhang, W. S.; Li, C. H.; Lin, S. Y.; Qu, Y. Y.; Zhang, Y. H. Discriminative object tracking via sparse representation and online dictionary learning. IEEE Transactions on Cybernetics Vol. 44, No. 4, 539–553, 2014.
[105]
Isard, M.; Blake, A. CONDENSATION—Conditional density propagation for visual tracking. International Journal of Computer Vision Vol. 29, No. 1, 5–28, 1998.
[106]
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.
[107]
Li, F.; Tian, C.; Zuo, W. M.; Zhang, L.; Yang, M. H. Learning spatial-temporal regularized correlation filters for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4904–4913, 2018.
[108]
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931–6939, 2017.
[109]
Lukežič, A.; Vojíř, T.; Čehovin Zajc, L.; Matas, J.; Kristan, M. Discriminative correlation filter tracker with channel and spatial reliability. International Journal of Computer Vision Vol. 126, No. 7, 671–688, 2018.
[110]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representation, 2015.
[111]
Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293–4302, 2016.
[112]
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
[113]
Li, C. L.; Liu, L.; Lu, A. D.; Ji, Q.; Tang, J. Challenge-aware RGBT tracking. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12367. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 222–237, 2020.
[114]
Chen, X.; Yan, B.; Zhu, J. W.; Wang, D.; Yang, X. Y.; Lu, H. C. Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8122–8131, 2021.
[115]
Yan, B.; Peng, H. W.; Fu, J. L.; Wang, D.; Lu, H. C. Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10428–10437, 2021.
[116]
Hare, S.; Saffari, A.; Torr, P. H. S. Struck: Structured output tracking with kernels. In: Proceedings of the International Conference on Computer Vision, 263–270, 2011.
[117]
Lukežič, A.; Zajc, L. Č; Vojíř, T.; Matas, J.; Kristan, M. Now you see me: Evaluating performance in long-term visual tracking. arXiv preprint arXiv:1804.07056, 2018.
[118]
Wu, Y.; Lim, J.; Yang, M. H. Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411–2418, 2013.
[119]
Wu, Y.; Lim, J.; Yang, M. H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834–1848, 2015.
[120]
Zhang, L. C.; Gonzalez-Garcia, A.; van de Weijer, J.; Danelljan, M.; Khan, F. S. Synthetic data generation for end-to-end thermal infrared tracking. IEEE Transactions on Image Processing Vol. 28, No. 4, 1837–1850, 2019.
[121]
Bendjebbour, A.; Delignon, Y.; Fouque, L.; Samson, V.; Pieczynski, W. Multisensor image segmentation using Dempster–Shafer fusion in Markov fields context. IEEE Transactions on Geoscience and Remote Sensing Vol. 39, No. 8, 1789–1798, 2001.
[122]
Xu, H. X.; Chua, T. S. Fusion of AV features and external information sources for event detection in team sports video. ACM Transactions on Multimedia Computing, Communications, and Applications Vol. 2, No. 1, 44–67, 2006.
[123]
Liu, H. X.; Simonyan, K.; Yang, Y. M. DARTS: Differentiable architecture search. In: Proceedings of the International Conference on Learning Representations, 2019.
[124]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. V. Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8697–8710, 2018.
[125]
Huang, C.; Lucey, S.; Ramanan, D. Learning policies for adaptive tracking with deep feature cascades. In: Proceedings of the IEEE International Conference on Computer Vision, 105–114, 2017.
[126]
Guo, Q.; Feng, W.; Gao, R. J.; Liu, Y.; Wang, S. Exploring the effects of blur and deblurring to visual object tracking. IEEE Transactions on Image Processing Vol. 30, 1812–1824, 2021.
[127]
Guo, Q.; Cheng, Z. Y.; Juefei-Xu, F.; Ma, L.; Xie, X. F.; Liu, Y.; Zhao, J. J. Learning to adversarially blur visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10819–10828, 2021.
Computational Visual Media
Pages 193-214
Cite this article:
Zhang P, Wang D, Lu H. Multi-modal visual tracking: Review and experimental comparison. Computational Visual Media, 2024, 10(2): 193-214. https://doi.org/10.1007/s41095-023-0345-5

502

Views

58

Downloads

7

Crossref

5

Web of Science

5

Scopus

0

CSCD

Altmetrics

Received: 09 January 2023
Accepted: 25 March 2023
Published: 03 January 2024
© The Author(s) 2023.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return