AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Proxy-Based Embedding Alignment for RGB-Infrared Person Re-Identification

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Baidu Inc., Beijing 100084, China
Show Author Information

Abstract

RGB-Infrared person re-IDentification (re-ID) aims to match RGB and infrared (IR) images of the same person. However, the modality discrepancy between RGB and IR images poses a significant challenge for re-ID. To address this issue, this paper proposes a Proxy-based Embedding Alignment (PEA) method to align the RGB and IR modalities in the embedding space. PEA introduces modality-specific identity proxies and leverages the sample-to-proxy relations to learn the model. Specifically, PEA focuses on three types of alignments: intra-modality alignment, inter-modality alignment, and cycle alignment. Intra-modality alignment aims to align sample features and proxies of the same identity within a modality. Inter-modality alignment aims to align sample features and proxies of the same identity across different modalities. Cycle alignment requires that a proxy is aligned with itself after tracing it along a cross-modality cycle (e.g., IR→RGB→IR). By integrating these alignments into the training process, PEA effectively mitigates the impact of modality discrepancy and learns discriminative features across modalities. We conduct extensive experiments on several RGB-IR re-ID datasets, and the results show that PEA outperforms current state-of-the-art methods. Notably, on SYSU-MM01 dataset, PEA achieves 71.0% mAP under the multi-shot setting of the indoor-search protocol, surpassing the best-performing method by 7.2%.

References

[1]
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, Scalable person re-identification: A benchmark, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1116–1124.
[2]

J. Lu, H. Wan, P. Li, X. Zhao, N. Ma, and Y. Gao, Exploring high-order spatio-temporal correlations from skeleton for person re-identification, IEEE Trans. Image Process., vol. 32, pp. 949–963, 2023.

[3]
Z. Dou, Z. Wang, W. Chen, Y. Li, and S. Wang, Reliability-aware prediction via uncertainty learning for person image retrieval, in Proc. 17 th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 588–605.
[4]
Z. Dou, Z. Wang, Y. Li, and S. Wang, Progressive-granularity retrieval via hierarchical feature alignment for person re-identification, in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP ), Singapore, 2022, pp. 2714–2718.
[5]
Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline), in Proc. 15 th European Conf. Computer Vision (ECCV ), Munich, Germany, 2018, pp. 501–518.
[6]
Y. Cho, W. J. Kim, S. Hong, and S. E. Yoon, Part-based pseudo label refinement for unsupervised person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 7298–7308.
[7]
X. Gu, H. Chang, B. Ma, S. Bai, S. Shan, and X. Chen, Clothes-changing person re-identification with RGB modality only, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 1050–1059.
[8]
Z. Wang, Z. Wang, Y. Zheng, Y. Y. Chuang, and S. Satoh, Learning to reduce dual-level discrepancy for infrared-visible person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 618–626.
[9]

M. Ye, X. Lan, Z. Wang, and P. C. Yuen, Bi-directional center-constrained top-ranking for visible thermal person re-identification, IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 407–419, 2020.

[10]
Q. Zhang, C. Lai, J. Liu, N. Huang, and J. Han, FMCNet: Feature-level modality compensation for visible-infrared person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 7339–7348.
[11]

N. Huang, J. Liu, Y. Luo, Q. Zhang, and J. Han, Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person Re-IDentification, Pattern Recognit., vol. 135, p. 109145, 2023.

[12]
M. Ye, Z. Wang, X. Lan, and P. C. Yuen, Visible thermal person re-identification via dual-constrained top-ranking, in Proc. 27 th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 1092–1099.
[13]
A. Wu, W. S. Zheng, H. X. Yu, S. Gong, and J. Lai, RGB-infrared cross-modality person re-identification, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5390–5399.
[14]

D. T. Nguyen, H. G. Hong, K. W. Kim, and K. R. Park, Person recognition system based on a combination of body images from visible light and thermal cameras, Sensors, vol. 17, no. 3, p. 605, 2017.

[15]
S. Liao, Y. Hu, X. Zhu, and S. Z. Li, Person re-identification by local maximal occurrence representation and metric learning, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 2197–2206.
[16]
Y. Yang, S. Liao, Z. Lei, and S. Z. Li, Large scale similarity learning using similar pairs for person verification, in Proc. 30 th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 3655–3661.
[17]

H. M. Hu, W. Fang, G. Zeng, Z. Hu, and B. Li, A person re-identification algorithm based on pyramid color topology feature, Multimed. Tools Appl., vol. 76, no. 24, pp. 26633–26646, 2017.

[18]
Y. Sun, Q. Xu, Y. Li, C. Zhang, Y. Li, S. Wang, and J. Sun, Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 393–402.
[19]

M. Pang, Y. M. Cheung, Q. Shi, and M. Li, Iterative dynamic generic learning for face recognition from a contaminated single-sample per person, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1560–1574, 2021.

[20]
R. He, X. Wu, Z. Sun, and T. Tan, Learning invariant deep representation for NIR-VIS face recognition, in Proc. 31 st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 2000–2006.
[21]
Y. Cao, M. Long, J. Wang, and S. Liu, Collective deep quantization for efficient cross-modal retrieval, in Proc. 31 st AAAI Conf. Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 3974–3980.
[22]
Y. Wu, S. Wang, and Q. Huang, Online asymmetric similarity learning for cross-modal retrieval, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3984–3993.
[23]

J. Huo, Y. Gao, Y. Shi, W. Yang, and H. Yin, Heterogeneous face recognition by margin-based cross-modality metric learning, IEEE Trans. Cybern., vol. 48, no. 6, pp. 1814–1826, 2018.

[24]

L. Lin, G. Wang, W. Zuo, X. Feng, and L. Zhang, Cross-domain visual matching via generalized similarity measure and feature learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1089–1102, 2017.

[25]

C. Peng, X. Gao, N. Wang, and J. Li, Graphical representation for heterogeneous face recognition, IEEE Trans, Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 301–312, 2017.

[26]

B. Yang, J. Chen, X. Ma, and M. Ye, Translation, Association and augmentation: Learning cross-modality re-identification from single-modality annotation, IEEE Transactions on Image Processing, vol. 32, pp. 5099–5113, 2023.

[27]
M. Ye, X. Lan, J. Li, and P. C. Yuen, Hierarchical discriminative learning for visible thermal person re-identification, in Proc. 32 nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 7501–7508.
[28]
Y. Hao, N. Wang, J. Li, and X. Gao, HSME: Hypersphere manifold embedding for visible thermal person re-identification, in Proc. 33 rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 8385–8392.
[29]

Y. Zhu, Z. Yang, L. Wang, S. Zhao, X. Hu, and D. Tao, Hetero-center loss for cross-modality person re-identification, Neurocomputing, vol. 386, pp. 97–109, 2020.

[30]
Y. Movshovitz-Attias, A. Toshev, T. K. Leung, S. Ioffe, and S. Singh, No fuss distance metric learning using proxies, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 360–368.
[31]
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, Neighbourhood components analysis, in Proc. 17 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2004, pp. 513–520.
[32]
Q. Qian, L. Shang, B. Sun, J. Hu, T. Tacoma, H. Li, and R. Jin, SoftTriple loss: Deep metric learning without triplet sampling, in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 6449–6457.
[33]
N. Aziere and S. Todorovic, Ensemble deep manifold similarity learning using hard proxies, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 7291–7299.
[34]
S. Kim, D. Kim, M. Cho, and S. Kwak, Proxy anchor loss for deep metric learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3235–3244.
[35]
Y. Wen, K. Zhang, Z. Li, and Y. Qiao, A discriminative feature learning approach for deep face recognition, in Proc. 14 th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 499–515.
[36]
P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, Image-to-image translation with conditional adversarial networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5967–5976.
[37]
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2242–2251.
[38]
C. Godard, O. M. Aodha, and G. J. Brostow, Unsupervised monocular depth estimation with left-right consistency, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6602–6611.
[39]
F. Wang, Q. Huang, and L. J. Guibas, Image co-segmentation via consistent functional maps, in Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 849–856.
[40]

L. Wu, Y. Wang, and L. Shao, Cycle-consistent deep generative hashing for cross-modal retrieval, IEEE Trans. Image Process., vol. 28, no. 4, pp. 1602–1612, 2019.

[41]
A. Hermans, L. Beyer, and B. Leibe, In defense of the triplet loss for person re-identification, arXiv preprint arXiv: 1703.07737, 2017.
[42]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[43]

D. Zhang, Z. Zhang, Y. Ju, C. Wang, Y. Xie, and Y. Qu, Dual mutual learning for cross-modality person re-identification, IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5361–5373, 2022.

[44]
M. Ye, J. Shen, D. J. Crandall, L. Shao, and J. Luo, Dynamic dual-attentive aggregation learning for visible-infrared person re-identification, in Proc. 16 th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 229–247.
[45]
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, Random erasing data augmentation, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 13001–13008.
[46]
G. Wang, T. Zhang, J. Cheng, S. Liu, Y. Yang, and Z. Hou, RGB-infrared cross-modality person re-identification via joint pixel and feature alignment, in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 3622–3631.
[47]
D. Li, X. Wei, X. Hong, and Y. Gong, Infrared-visible cross-modal person re-identification with an X modality, in Proc. 34 th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 4610–4617.
[48]
P. Dai, R. Ji, H. Wang, Q. Wu, and Y. Huang, Cross-modality person re-identification with generative adversarial training, in Proc. 27 th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 677–683.
[49]

Z. Feng, J. Lai, and X. Xie, Learning modality-specific representations for visible-infrared person re-identification, IEEE Trans. Image Process., vol. 29, pp. 579–590, 2020.

[50]
Y. Lu, Y. Wu, B. Liu, T. Zhang, B. Li, Q. Chu, and N. Yu, Cross-modality person re-identification with shared-specific feature transfer, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 13376–13386.
[51]
Y. Chen, L. Wan, Z. Li, Q. Jing, and Z. Sun, Neural feature search for RGB-infrared person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 587–597.
[52]
X. Tian, Z. Zhang, S. Lin, Y. Qu, Y. Xie, and L. Ma, Farewell to mutual information: Variational distillation for cross-modal person re-identification, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 1522–1531.
[53]

Y. Zhou, R. Li, Y. Sun, K. Dong, and S. Li, Knowledge self-distillation for visible-infrared cross-modality person re-identification, Appl. Intell., vol. 52, no. 9, pp. 10617–10631, 2022.

[54]
Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, Circle loss: A unified perspective of pair similarity optimization, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 6397–6406.
[55]

L. Van Der Maaten, Accelerating t-SNE using tree-based algorithms, J. Mach. Learn. Res., vol. 15, no. 1, pp. 3221–3245, 2014.

Tsinghua Science and Technology
Pages 1112-1124
Cite this article:
Dou Z, Sun Y, Li Y, et al. Proxy-Based Embedding Alignment for RGB-Infrared Person Re-Identification. Tsinghua Science and Technology, 2025, 30(3): 1112-1124. https://doi.org/10.26599/TST.2023.9010113

25

Views

1

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 06 April 2023
Revised: 26 September 2023
Accepted: 28 September 2023
Published: 08 April 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return