Scholar - SciOpen

Open Access Issue

Proxy-Based Embedding Alignment for RGB-Infrared Person Re-Identification

Zhaopeng Dou, Yifan Sun, Yali Li, Shengjin Wang

Tsinghua Science and Technology 2025, 30(3): 1112-1124

Published: 08 April 2024

Abstract

PDF (2.5 MB) Collect Collected

Downloads：8

RGB-Infrared person re-IDentification (re-ID) aims to match RGB and infrared (IR) images of the same person. However, the modality discrepancy between RGB and IR images poses a significant challenge for re-ID. To address this issue, this paper proposes a Proxy-based Embedding Alignment (PEA) method to align the RGB and IR modalities in the embedding space. PEA introduces modality-specific identity proxies and leverages the sample-to-proxy relations to learn the model. Specifically, PEA focuses on three types of alignments: intra-modality alignment, inter-modality alignment, and cycle alignment. Intra-modality alignment aims to align sample features and proxies of the same identity within a modality. Inter-modality alignment aims to align sample features and proxies of the same identity across different modalities. Cycle alignment requires that a proxy is aligned with itself after tracing it along a cross-modality cycle (e.g., IR→RGB→IR). By integrating these alignments into the training process, PEA effectively mitigates the impact of modality discrepancy and learns discriminative features across modalities. We conduct extensive experiments on several RGB-IR re-ID datasets, and the results show that PEA outperforms current state-of-the-art methods. Notably, on SYSU-MM01 dataset, PEA achieves 71.0% mAP under the multi-shot setting of the indoor-search protocol, surpassing the best-performing method by 7.2%.

Open Access Issue

Prompting and Tuning: A Two-Stage Unsupervised Domain Adaptive Person Re-identification Method on Vision Transformer Backbone

Shengming Yu, Zhaopeng Dou, Shengjin Wang

Tsinghua Science and Technology 2023, 28(4): 799-810

Published: 06 January 2023

Abstract

PDF (2.7 MB) Collect Collected

Downloads：66

This paper explores the Vision Transformer (ViT) backbone for Unsupervised Domain Adaptive (UDA) person Re-Identification (Re-ID). While some recent studies have validated ViT for supervised Re-ID, no study has yet to use ViT for UDA Re-ID. We observe that the ViT structure provides a unique advantage for UDA Re-ID, i.e., it has a prompt (the learnable class token) at its bottom layer, that can be used to efficiently condition the deep model for the underlying domain. To utilize this advantage, we propose a novel two-stage UDA pipeline named Prompting And Tuning (PAT) which consists of a prompt learning stage and a subsequent fine-tuning stage. In the first stage, PAT roughly adapts the model from source to target domain by learning the prompts for two domains, while in the second stage, PAT fine-tunes the entire backbone for further adaption to increase the accuracy. Although these two stages both adopt the pseudo labels for training, we show that they have different data preferences. With these two preferences, prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.

Open Access Issue

Asymmetric Deep Hashing for Person Re-Identifications

Yali Zhao, Yali Li, Shengjin Wang

Tsinghua Science and Technology 2022, 27(2): 396-411

Published: 29 September 2021

Abstract

PDF (14.1 MB) Collect Collected

Downloads：152

The person re-identification (re-ID) community has witnessed an explosion in the scale of data that it has to handle. On one hand, it is important for large-scale re-ID to provide constant or sublinear search time and dramatically reduce the storage cost for data points from the viewpoint of efficiency. On the other hand, the semantic affinity existing in the original space should be preserved because it greatly boosts the accuracy of re-ID. To this end, we use the deep hashing method, which utilizes the pairwise similarity and classification label to learn deep hash mapping functions, in order to provide discriminative representations. More importantly, considering the great advantage of asymmetric hashing over the existing symmetric one, we finally propose an asymmetric deep hashing (ADH) method for large-scale re-ID. Specifically, a two-stream asymmetric convolutional neural network is constructed to learn the similarity between image pairs. Another asymmetric pairwise loss is formulated to capture the similarity between the binary hashing codes and real-value representations derived from the deep hash mapping functions, so as to constrain the binary hash codes in the Hamming space to preserve the semantic structure existing in the original space. Then, the image labels are further explored to have a direct impact on the hash function learning through a classification loss. Furthermore, an efficient alternating algorithm is elaborately designed to jointly optimize the asymmetric deep hash functions and high-quality binary codes, by optimizing one parameter with the other parameters fixed. Experiments on the four benchmarks, i.e., DukeMTMC-reID, Market-1501, Market-1501+500k, and CUHK03 substantiate the competitive accuracy and superior efficiency of the proposed ADH over the compared state-of-the-art methods for large-scale re-ID.

Open Access Issue

Exploiting Sparse Representation in the P300 Speller Paradigm

Hongma Liu, Yali Li, Shengjin Wang

Tsinghua Science and Technology 2021, 26(4): 440-451

Published: 04 January 2021

Abstract

PDF (2.7 MB) Collect Collected

Downloads：84

A Brain-Computer Interface (BCI) aims to produce a new way for people to communicate with computers. Brain signal classification is a challenging issue owing to the high-dimensional data and low Signal-to-Noise Ratio (SNR). In this paper, a novel method is proposed to cope with this problem through sparse representation for the P300 speller paradigm. This work is distinguished using two key contributions. First, we investigate sparse coding and its feasibility for brain signal classification. Training signals are used to learn the dictionaries and test signals are classified according to their sparse representation and reconstruction errors. Second, sample selection and a channel-aware dictionary are proposed to reduce the effect of noise, which can improve performance and enhance the computing efficiency simultaneously. A novel classification method from the sample set perspective is proposed to exploit channel correlations. Specifically, the brain signal of each channel is classified jointly using its spatially neighboring channels and a novel weighted regulation strategy is proposed to overcome outliers in the group. Experimental results have demonstrated that our methods are highly effective. We achieve a state-of-the-art recognition rate of 72.5%, 88.5%, and 98.5% at 5, 10, and 15 epochs, respectively, on BCI Competition III Dataset II.

Open Access Issue

Incremental Face Clustering with Optimal Summary Learning Via Graph Convolutional Network

Xuan Zhao, Zhongdao Wang, Lei Gao, Yali Li, Shengjin Wang

Tsinghua Science and Technology 2021, 26(4): 536-547

Published: 04 January 2021

Abstract

PDF (10 MB) Collect Collected

Downloads：59

In this study, we address the problems encountered by incremental face clustering. Without the benefit of having observed the entire data distribution, incremental face clustering is more challenging than static dataset clustering. Conventional methods rely on the statistical information of previous clusters to improve the efficiency of incremental clustering; thus, error accumulation may occur. Therefore, this study proposes to predict the summaries of previous data directly from data distribution via supervised learning. Moreover, an efficient framework to cluster previous summaries with new data is explored. Although learning summaries from original data costs more than those from previous clusters, the entire framework consumes just a little bit more time because clustering current data and generating summaries for new data share most of the calculations. Experiments show that the proposed approach significantly outperforms the existing incremental face clustering methods, as evidenced by the improvement of average F-score from 0.644 to 0.762. Compared with state-of-the-art static face clustering methods, our method can yield comparable accuracy while consuming much less time.

Open Access Issue

Person Re-Identification with Effectively Designed Parts

Yali Zhao, Yali Li, Shengjin Wang

Tsinghua Science and Technology 2020, 25(3): 415-424

Published: 07 October 2019

Abstract

PDF (5.1 MB) Collect Collected

Downloads：36

Person re-IDentification (re-ID) is an important research topic in the computer vision community, with significance for a range of applications. Pedestrians are well-structured objects that can be partitioned, although detection errors cause slightly misaligned bounding boxes, which lead to mismatches. In this paper, we study the person re-identification performance of using variously designed pedestrian parts instead of the horizontal partitioning routine typically applied in previous hand-crafted part works, and thereby obtain more effective feature descriptors. Specifically, we benchmark the accuracy of individual part matching with discriminatively trained Convolutional Neural Network (CNN) descriptors on the Market-1501 dataset. We also investigate the complementarity among different parts using combination and ablation studies, and provide novel insights into this issue. Compared with the state-of-the-art, our method yields a competitive accuracy rate when the best part combination is used on two large-scale datasets (Market-1501 and CUHK03) and one small-scale dataset (VIPeR).

Open Access Issue

Exploiting Effective Facial Patches for Robust Gender Recognition

Jingchun Cheng, Yali Li, Jilong Wang, Le Yu, Shengjin Wang

Tsinghua Science and Technology 2019, 24(3): 333-345

Published: 24 January 2019

Abstract

PDF (1.6 MB) Collect Collected

Downloads：31

Gender classification is an important task in automated face analysis. Most existing approaches for gender classification use only raw/aligned face images after face detection as input. These methods exhibit fair classification ability under constrained conditions, in which face images are acquired under similar illumination with similar poses. The performances of these methods may deteriorate when face images show drastic variances in poses and occlusion as routinely encountered in real-world data. The reduction in the performances of current gender classification methods may be attributed to the sensitiveness of features to image translations. This work proposes to alleviate this sensitivity by introducing a majority voting procedure that involves multiple face patches. Specifically, this work utilizes a deep learning method based on multiple large patches. Several Convolutional Neural Networks (CNN) are trained on individual, predefined patches that reflect various image resolutions and partial cropping. The decisions of each CNN are aggregated through majority voting to obtain the final gender classification accurately. Extensive experiments are conducted on four gender classification databases, including Labeled Face in-the-Wild (LFW), CelebA, ColorFeret, and All-Age Faces database, a novel database collected by our group. Each individual patch is evaluated, and complementary patches are selected for voting. We show that the classification accuracy of our method is comparable with that of state-of-the-art systems. This characteristic validates the effectiveness of our proposed method.

Open Access Issue

Improved Bag-of-Words Model for Person Re-identification

Lu Tian, Shengjin Wang

Tsinghua Science and Technology 2018, 23(2): 145-156

Published: 02 April 2018

Abstract

PDF (600.5 KB) Collect Collected

Downloads：22

Person re-identification (person re-id) aims to match observations on pedestrians from different cameras. It is a challenging task in real word surveillance systems and draws extensive attention from the community. Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words (iBoW) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed iBoW descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPeR, PRID450S, and Market1501.