| Sign up

PDF (7.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

Transferring pose and augmenting background for deep human-image parsing and its applications

Takazumi Kikuchi^¹, Yuki Endo^¹(), Yoshihiro Kanamori^¹(), Taisuke Hashimoto^¹, Jun Mitani^¹

1 University of Tsukuba, 1-1-1 Tennohdai, Tsukuba City, Ibaraki, Japan.

Show Author Information

Abstract

Parsing of human images is a fundamental task for determining semantic parts such as the face, arms, and legs, as well as a hat or a dress. Recent deep-learning-based methods have achieved significant improvements, but collecting training datasets with pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited datasets. Firstly, to handle various poses, we incorporate a pose estimation network into an end-to-end human-image parsing network, in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and can feed valuable features to the human-image parsing network. Secondly, to handle complicated backgrounds, we increase the variation in image backgrounds automatically by replacing the original backgrounds of human images with others obtained from large-scale scenery image datasets. Individually, each solution is versatile and beneficial to human-image parsing, while their combination yields further improvement. We demonstrate the effectiveness of our approach through comparisons and various applications such as garment recoloring, garment texture transfer, and visualization for fashion analysis.

Keywords

image segmentation semantic segmentation human-image parsing deep convolutional neural network

Electronic Supplementary Material

Download File(s)

41095_2017_98_MOESM1_ESM.pdf (1.1 MB)

References

[1]

Kanamori,

Y.

; Yamada,

H.

; Hirose,

M.

; Mitani,

J.

; Fukui,

Y.

Image-based virtual try-on system with garment reshaping and color correction. In: Lecture Notes in Computer Science, Vol. 9550. Gavrilova,

M.

; Tan,

C.

; Iglesias,

A.

; Shinya,

M.

; Galvez,

A.

; Sourin,

A.

Eds. Berlin, Heidelberg: Springer, 1-16, 2016.

[2]

Di,

W.

; Wah,

C.

; Bhardwaj,

A.

; Piramuthu,

R.

; Sundaresan,

N.

Style finder: Fine-grained clothing style detection and retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 8-13, 2013.

[3]

Hu,

Y.

; Yi,

X.

; Davis,

L. S.

Collaborative fashion recommendation: A functional tensor factorization approach. In: Proceedings of the 23rd ACM International Conference on Multimedia, 129-138, 2015.

[4]

Kalantidis,

Y.

; Kennedy,

L.

; Li,

L.-J.

Getting the look: Clothing recognition and segmentation for automatic product suggestions in everyday photos. In: Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, 105-112, 2013.

[5]

Wei,

S.-E.

; Ramakrishna,

V.

; Kanade,

T.

; Sheikh,

Y.

Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724-4732, 2016.

[6]

Liang,

X.

; Xu,

C.

; Shen,

X.

; Yang,

J.

; Tang,

J.

; Lin,

L.

; Yan,

S.

Human parsing with contextualized convolutional neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 1, 115-127, 2017.

Crossref Google Scholar

[7]

Quattoni,

A.

; Torralba,

A.

Recognizing indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 413-420, 2009.

[8]

Yamaguchi,

K.

; Kiapour,

M. H.

; Ortiz,

L. E.

; Berg,

T. L.

Parsing clothing in fashion photographs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3570-3577, 2012.

[9]

Yamaguchi,

K.

; Kiapour,

M.

; Ortiz,

L.

; Berg,

T.

Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 5, 1028-1040, 2015.

Crossref Google Scholar

[10]

Simo-Serra,

E.

; Fidler,

S.

; Moreno-Noguer,

F.

; Urtasun,

R.

A high performance CRF model for clothes parsing. In: Proceedings of the Asian Conference on Computer Vision, 64-81, 2014.

[11]

Dong,

J.

; Chen,

Q.

; Shen,

X.

; Yang,

J.

; Yan,

S.

Towards unified human parsing and pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 843-850, 2014.

[12]

Liu,

S.

; Liang,

X.

; Liu,

L.

; Lu,

K.

; Lin,

L.

; Yan,

S.

Fashion parsing with video context. In: Proceedings of the 22nd ACM International Conference on Multimedia, 467-476, 2014.

[13]

Liang,

X.

; Liu,

S.

; Shen,

X.

; Yang,

J.

; Liu,

L.

; Dong,

J.

; Lin,

L.

; Yan,

S.

Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 12, 2402-2414, 2015.

Crossref Google Scholar

[14]

Liu,

S.

; Liang,

X.

; Liu,

L.

; Shen,

X.

; Yang,

J.

; Xu,

C.

; Lin,

L.

; Cao,

X.

; Yan,

S.

Matching-CNN meets KNN: Quasi-parametric human parsing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1419-1427, 2015.

[15]

Bertasius,

G.

; Shi,

J.

; Torresani,

L.

Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3602-3610, 2016.

[16]

Ghiasi,

G.

; Fowlkes,

C. C.

Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of the European Conference on Computer Vision, 519-534, 2016.

[17]

Liang,

X.

; Shen,

X.

; Feng,

J.

; Lin,

L.

; Yan,

S.

Semantic object parsing with graph LSTM. In: Proceedings of the European Conference on Computer Vision, 125-143, 2016.

[18]

Liang,

X.

; Shen,

X.

; Xiang,

D.

; Feng,

J.

; Lin,

L.

; Yan,

S.

Semantic object parsing with local-global long short-term memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3185-3193, 2016.

[19]

Lin,

G.

; Shen,

C.

; van den Hengel,

A.

; Reid,

I.

Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3194-3203, 2016.

[20]

Vemulapalli,

R.

; Tuzel,

O.

; Liu,

M.-Y.

; Chellapa,

R.

Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3224-3233, 2016.

[21]

Dai,

J.

; He,

K.

; Sun,

J.

Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3150-3158, 2016.

[22]

Hong,

S.

; Oh,

J.

; Lee,

H.

; Han,

B.

Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3204-3212, 2016.

[23]

Papandreou,

G.

; Chen,

L.

; Murphy,

K. P.

; Yuille,

A. L.

Weakly- and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 1742-1750, 2015.

[24]

Yang,

W.

; Ouyang,

W.

; Li,

H.

; Wang,

X.

End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3073-3082, 2016.

[25]

Chu,

X.

; Ouyang,

W.

; Li,

H.

; Wang,

X.

Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715-4723, 2016.

[26]

Andriluka,

M.

; Pishchulin,

L.

; Gehler,

P.

; Schiele,

B.

2D human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3686-3693, 2014.

[27]

Aksoy,

Y.

; Aydin,

T. O.

; Pollefeys,

M.

Designing effective inter-pixel information flow for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 29-37, 2017.

[28]

Floater,

M. S.

Mean value coordinates. Computer Aided Geometric Design Vol. 20, No. 1, 19-27, 2003.

Crossref Google Scholar

[29]

Van der Maaten,

L.

; Hinton,

G.

Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, 2579-2605, 2008.

[30]

Simo-Serra,

E.

; Ishikawa,

H.

Fashion style in 128 floats: Joint ranking and classification using weak data for feature extraction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 298-307, 2016.

[31]

He,

H.

; Bai,

Y.

; Garcia,

E. A.

; Li,

S.

ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322-1328, 2008.

Computational Visual Media

Volume 4 Issue 1,
March 2018

Pages 43-54

DOI: 10.1007/s41095-017-0098-0

Cite this article:

Kikuchi T, Endo Y, Kanamori Y, et al. Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media, 2018, 4(1): 43-54. https://doi.org/10.1007/s41095-017-0098-0

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号