Improving Semantic Part Features for Person Re-identification with Supervised Non-local Similarity

Yifan Sun; Zhaopeng Dou; Yali Li; Shengjin Wang

doi:10.26599/TST.2019.9010024

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (3.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Improving Semantic Part Features for Person Re-identification with Supervised Non-local Similarity

Yifan Sun, Zhaopeng Dou, Yali Li, Shengjin Wang(

)

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.

Show Author Information

Abstract

In person re-IDentification (re-ID) task, the learning of part-level features benefits from fine-grained information. To facilitate part alignment, which is a prerequisite for learning part-level features, a popular approach is to detect semantic parts with the use of human parsing or pose estimation. Such methods of semantic partition do offer cues to good part alignment but are prone to noisy part detection, especially when they are employed in an off-the-shelf manner. In response, this paper proposes a novel part feature learning method for re-ID, that suppresses the impact of noisy semantic part detection through Supervised Non-local Similarity (SNS) learning. Given several detected semantic parts, SNS first locates their center points on the convolutional feature maps for use as a set of anchors and then evaluates the similarity values between these anchors and each pixel on the feature maps. The non-local similarity learning is supervised such that: each anchor should be similar to itself and simultaneously dissimilar to any other anchors, thus yielding the SNS. Finally, each anchor absorbs features from all of the similar pixels on the convolutional feature maps to generate a corresponding part feature (SNS feature). We evaluate our method with extensive experiments conducted under both holistic and partial re-ID scenarios. Experimental results confirm that SNS consistently improves re-ID accuracy using human parsing or pose estimation, and that our results are on par with state-of-the-art methods.

Keywords

feature learning person re-identification non-local similarity semantic parts

References

[1]

Y. F. Sun, L. Zheng, Y. Yang, Q. Tian, and S. J. Wang, Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline), presented at European Conference on Computer Vision (ECCV), Munich, Germany, 2018.

Crossref

[2]

L. H. Wei, S. L. Zhang, H. T Yao, W. Gao, and Q. Tian, GLAD: Global-local-alignment descriptor for pedestrian retrieval, in Proceedings of the 25th ACM International Conference on Multiondia, 2017.

Crossref

[3]

C. Su, J. I. Li, S. L. Zhang, J. L. Xing, W. Gao, and Q. Tian, Pose-driven deep convolutional model for person re-identification, presented at IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Crossref

[4]

Y. M. Suh, J. D. Wang, S. Y. Tang, T. Mei, and K. M. Lee, Part-aligned bilinear representations for person re-identification, presented at European Conference on Computer Vision (ECCV), Munich, Germany, 2018.

Crossref

[5]

L. M. Zhao, X. Li, Y. T. Zhuang, and J. D. Wang, Deeply-learned part-aligned representations for person re-identification, presented at IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.

Crossref

[6]

C. Wang, Q. Zhang, C. Huang, W. Y. Liu, and X. G. Wang, Mancs: A multi-task attentional network with curriculum sampling for person re-identification, presented at European Conference on Computer Vision, Munich, Germany, 2018.

Crossref

[7]

M. M. Kalayeh, E. Basaran, M. Gokmen, M.E. Kamasak, and M. Shah, Human semantic parsing for person reidentification, presented at European Conference on Computer Vision (ECCV), Munich, Germany, 2018.

Crossref

[8]

L. Zheng, L. Y. Shen, L. Tian, S. J. Wang, J. D. Wang, and Q. Tian, Scalable person re-identification: A benchmark, presented at IEEE International Conference on Computer Vision, Santiago, Chile, 2015.

Crossref

[9]

X. L. Wang, R. Girshick, A. Gupta, and K. M. He, Non-local neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[10]

L. Zheng, Y. Yang, and A. G. Hauptmann, Person reidentification: Past, present and future, arXiv preprint arXiv: 1610.02984, 2016.

Google Scholar

[11]

L. C. Chen, G. Papandreou, and I. Kokkinos, Semantic image segmentation with deep convolutional nets and fully connected CRFs, Computer Science, vol. 4, pp. 357-361, 2014.

Google Scholar

[12]

K. Gong, X. D. Liang, D. Y. Zhang, X. H. Shen, and L. Lin, Look into Person: Self-supervised structure-sensitive learning and a new benchmark for human parsing, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017.

Crossref

[13]

J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640-651, 2016.

Crossref Google Scholar

[14]

S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, Convolutional pose machines, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

Crossref

[15]

E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, DeeperCut: A deeper, stronger, and faster multi-person pose estimation model, presented at European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016.

Crossref

[16]

Z. Cao, T. Simon, S. E. Wei, and Y. Sheikh, Realtime multiperson 2d pose estimation using part affinity fields, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Crossref

[17]

W. Li, X. T. Zhu, and S. G. Gong, Harmonious attention network for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[18]

X. H. Liu, H. Y. Zhao, M. Q. Tian, L. Sheng, J. Shao, S. Yi, J. J. Yan, and X. G. Wang, HydraPlus-Net: Attentive deep features for pedestrian analysis, in Proceedings of the IEEE International Conference on Computer Vision, 2017.

Crossref

[19]

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. S. Bengio, Show, attend and tell: Neural image caption generation with visual attention, Computer Science, in Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.

[20]

X. L. He, J. Liang, H. Q. Li, and Z. N. Sun, Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[21]

W. S. Zheng, X. Li, T. Xiang, S. C. Liao, J. H. Lai, and S. G. Gong, Partial person re-identification, presented at IEEE International Conference on Computer Vision, Santiago, Chile, 2015.

Crossref

[22]

E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, Performance measures and a data set for multi-target, multi-camera tracking, presented at European Conference on Computer Vision, Amsterdam, The Netherlands, 2016.

Crossref

[23]

Z. D. Zheng, L. Zheng, and Y. Yang, Unlabeled samples generated by GAN improve the person re-identification baseline in vitro, presented at IEEE International Conference on Computer Vision, Venice, Italy, 2017.

Crossref

[24]

W. S. Zheng and S. G. Gong, and T. Xiang, Person re-identification by probabilistic relative distance comparison, in Computer Vision and Pattern Recognition. Springer, 2011.

Crossref

[25]

P. Felzenszwalb, D. McAllester, and D. Ramanan, A discriminatively trained, multiscale, deformable part model, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008.

Crossref

[26]

D. L. Gray, S. Brennan, and H. Tao, Evaluating appearance models for recognition, reacquisition, and tracking, in Proceedings of the 10th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, 2007.

[27]

S. Y. Lo, H.M. Hang, S. W. Chan, and J. J. Lin, Efficient dense modules of asymmetric convolution for real-time semantic segmentation, arXiv preprint arXiv: 1809.06323, 2018.

Google Scholar

[28]

L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, and H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[29]

T. Xiao, H. S. Li, W. Ouyang, and X. G. Wang, Learning deep feature representations with domain guided dropout for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.

Crossref

[30]

Y. F. Sun, L. Zheng, W. J. Deng, and S. J. Wang, SVDNet for pedestrian retrieval, presented at IEEE International Conference on Computer Vision, Venice, Italy, 2017.

Crossref

[31]

H. J. Huang, D. W. Li, Z. Zhang, X. T. Chen, and K. Q. Huang, Adversarially occluded samples for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[32]

Z. Zhong, L. Zheng, and G. Kang, Random erasing data augmentation, arXiv preprint arXiv: 1708.04896, 2017.

Google Scholar

[33]

W. J. Deng, L. Zheng, Q. X. Ye, G. L. Kang, Y. Yang, and J. B. Jiao, Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

Crossref

[34]

Y. Zhang, T. Xiang, T. M. Hospedales, and H. H. Lu, Deep mutual learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[35]

Z. Zhong, L. Zheng, Z. Z. Zheng, S. Z. Li, and Y. Yang, Camera style adaptation for person re-identification, arXiv preprint arXiv: 1711.10295, 2017.

Google Scholar

[36]

X. B. Chang, T. M. Hospedales, and T. Xiang, Multilevel factorization net for person re-identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

Crossref

[37]

S. M. Saquib, A. Schumann, A. Eberle, and R. Stiefelhagen, A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

[38]

S. C. Liao, A. K. Jain, and S. Z. Li, Partial face recognition: Alignment-free approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 6, pp. 1193-1205, 2012.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 25 Issue 5,
October 2020

Pages 636-646

DOI: 10.26599/TST.2019.9010024

Cite this article:

Sun Y, Dou Z, Li Y, et al. Improving Semantic Part Features for Person Re-identification with Supervised Non-local Similarity. Tsinghua Science and Technology, 2020, 25(5): 636-646. https://doi.org/10.26599/TST.2019.9010024

623

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 22 January 2019

Accepted: 23 May 2019

Published: 16 March 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).