JMNet: A joint matting network for automatic human matting

Xian Wu; Xiao-Nan Fang; Tao Chen; Fang-Lue Zhang

doi:10.1007/s41095-020-0168-6

| Sign up

PDF (712.8 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

JMNet: A joint matting network for automatic human matting

Xian Wu^¹, Xiao-Nan Fang^¹, Tao Chen^², Fang-Lue Zhang^³()

1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

2 AI Center at Visual China Group, Burlingame, CA 94010, USA.

3 School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.

Show Author Information

Abstract

We propose a novel end-to-end deep learning framework, the Joint Matting Network (JMNet), to automatically generate alpha mattes for human images. We utilize the intrinsic structures of the human body as seen in images by introducing a pose estimation module, which can provide both global structural guidance and a local attention focus for the matting task. Our network model includes a pose network, a trimap network, a matting network, and a shared encoder to extract features for the above three networks. We also append a trimap refinement module and utilize gradient loss to provide a sharper alpha matte. Extensive experiments have shown that our method outperforms state-of-the-art human matting techniques; the shared encoder leads to better performance and lower memory costs. Our model can process real images downloaded from the Internet for use in composition applications.

Keywords

alpha matting human images deep learning pose estimation

References

[1]

Chen,

; D.

Qi,

; J.

Shen,

Boundary-aware network for fast and high-accuracy portrait segmentation. arXiv preprint arXiv:1901.03814, 2019.

[2]

X. Y.

Shen,

; A.

Hertzmann,

; J. Y.

Jia,

; S.

Paris,

; B.

Price,

; E.

Shechtman,

; I.

Sachs,

Automatic portrait segmentation for image stylization. Computer Graphics Forum Vol. 35, No. 2, 93-102, 2016.

Crossref Google Scholar

[3]

Levin,

; D.

Lischinski,

; Y.

Weiss,

A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228-242, 2008.

Crossref Google Scholar

[4]

Q. F.

Chen,

; D.

Li,

; C. K.

Tang,

KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 9, 2175-2188, 2013.

Crossref Google Scholar

[5]

X. Y.

Shen,

; X.

Tao,

; H. Y.

Gao,

; C.

Zhou,

; J. Y.

Jia,

Deep automatic portrait matting. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. B.

Leibe,

; J.

Matas,

; N.

Sebe,

; M.

Welling,

Eds. Springer Cham, 92-107, 2016.

Crossref

[6]

Chen,

; T. Z.

Ge,

; Y. Y.

Xu,

; Z. Q.

Zhang,

; X. X.

Yang,

; K.

Gai,

Semantic human matting. In: Proceedings of the 26th ACM International Conference on Multimedia, 618-626, 2018.

Crossref

[7]

Xu,

; B.

Price,

; S.

Cohen,

; T.

Huang,

Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2970-2979, 2017.

Crossref

[8]

Y.-Y.

Chuang,

; B.

Curless,

; D. H.

Salesin,

; R.

Szeliski,

A Bayesian approach to digital matting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 264-271, 2001.

[9]

Wang,

; M. F.

Cohen,

Optimized color sampling for robust matting. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 1-8, 2007.

Crossref

[10]

E. S. L.

Gastal,

; M. M.

Oliveira,

Shared sampling for real-time alpha matting. Computer Graphics Forum Vol. 29, No. 2, 575-584, 2010.

Crossref Google Scholar

[11]

He,

; C.

Rhemann,

; C.

Rother,

; X.

Tang,

; J.

Sun

A global sampling method for alpha matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2049-2056, 2011.

Crossref

[12]

Cho,

; Y. W.

Tai,

; I.

Kweon,

Natural image matting using deep convolutional neural networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. B.

Leibe,

; J.

Matas,

; N.

Sebe,

; M.

Welling,

Eds. Springer Cham, 626-643, 2016.

Crossref

[13]

Lutz,

; K.

Amplianitis,

; A.

Smolic,

Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018.

[14]

J. W.

Tang,

; Y.

Aksoy,

; C.

Oztireli,

; M.

Gross,

; T. O.

Aydin,

Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3055-3063, 2019.

Crossref

[15]

Long,

; E.

Shelhamer,

; T.

Darrell,

Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.

Crossref

[16]

H. S.

Zhao,

; J. P.

Shi,

; X. J.

Qi,

; X. G.

Wang,

; J. Y.

Jia,

Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881-2890, 2017.

Crossref

[17]

Y. K.

Zhang,

; L. X.

Gong,

; L. B.

Fan,

; P. R.

Ren,

; Q. X.

Huang,

; H. J.

Bao,

; W.

Xu,

A late fusion CNN for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7469-7478, 2019.

Crossref

[18]

Ronneberger,

; P.

Fischer,

; T.

Brox,

U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. N.

Navab,

; J.

Hornegger,

; W.

Wells,

; A.

Frangi,

Eds. Springer Cham, 234-241, 2015.

Crossref

[19]

L. C.

Chen,

; G.

Papandreou,

; I.

Kokkinos,

; K.

Murphy,

; A. L.

Yuille,

DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.

Crossref Google Scholar

[20]

Carreira,

; P.

Agrawal,

; K.

Fragkiadaki,

; J.

Malik,

Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4733-4742, 2016.

Crossref

[21]

Toshev,

; C.

Szegedy,

DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653-1660, 2014.

Crossref

[22]

Newell,

; K. Y.

Yang,

; J.

Deng,

Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. B.

Leibe,

; J.

Matas,

; N.

Sebe,

; M.

Welling,

Eds. Springer Cham, 483-499, 2016.

Crossref

[23]

S.-E.

Wei,

; V.

Ramakrishna,

; T.

Kanade,

; Y.

Sheikh,

Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724-4732, 2016.

Crossref

[24]

Chu,

; W. L.

Ouyang,

; H. S.

Li,

; X. G.

Wang,

Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715-4723, 2016.

Crossref

[25]

X. D.

Liang,

; K.

Gong,

; X. H.

Shen,

; L.

Lin,

Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 871-885, 2019.

Crossref Google Scholar

[26]

Kikuchi,

; Y.

Endo,

; Y.

Kanamori,

; T.

Hashimoto,

; J.

Mitani,

Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media Vol. 4, No. 1, 43-54, 2018.

Crossref Google Scholar

[27]

Wu,

; R. L.

Li,

; F. L.

Zhang,

; J. C.

Liu,

; J.

Wang,

; A.

Shamir,

; S.-M.

Hu,

Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344-2355, 2020.

Crossref Google Scholar

[28]

K. M.

He,

; X. Y.

Zhang,

; S. Q.

Ren,

; J.

Sun,

Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

Crossref

[29]

K. M.

He,

; G.

Gkioxari,

; P.

Dollar,

; R.

Girshick,

Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.

Crossref

[30]

T. Y.

Lin,

; P.

Dollar,

; R.

Girshick,

; K. M.

He,

; B.

Hariharan,

; S.

Belongie,

Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117-2125, 2017.

Crossref

[31]

Cao,

; G.

Hidalgo,

; T.

Simon,

; S. E.

Wei,

; Y.

Sheikh,

OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008, 2018.

Crossref

[32]

T. Y.

Lin,

; M.

Maire,

; S.

Belongie,

; J.

Hays,

; P.

Perona,

; D.

Ramanan,

; P.

Dollár,

; C. L.

Zitnick,

Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D.

Fleet,

; T.

Pajdla,

; B..

Schiele,

; T.

Tuytelaars

Eds. Springer Cham, 740-755, 2014.

Crossref

[33]

Rhemann,

; C.

Rother,

; J.

Wang,

; M.

Gelautz,

; P.

Kohli,

; P.

Rott,

A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1826-1833, 2009.

Crossref

Computational Visual Media

Volume 6 Issue 2,
June 2020

Pages 215-224

DOI: 10.1007/s41095-020-0168-6

Cite this article:

Wu X, Fang X-N, Chen T, et al. JMNet: A joint matting network for automatic human matting. Computational Visual Media, 2020, 6(2): 215-224. https://doi.org/10.1007/s41095-020-0168-6