PDF (712.8 KB)
Collect
Submit Manuscript
Show Outline
Figures (6)

Tables (4)
Table 1
Table 2
Table 3
Table 4
Research Article | Open Access

JMNet: A joint matting network for automatic human matting

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
AI Center at Visual China Group, Burlingame, CA 94010, USA.
School of Engineering and Computer Science, Victoria University of Wellington, New Zealand.
Show Author Information

Abstract

We propose a novel end-to-end deep learning framework, the Joint Matting Network (JMNet), to automatically generate alpha mattes for human images. We utilize the intrinsic structures of the human body as seen in images by introducing a pose estimation module, which can provide both global structural guidance and a local attention focus for the matting task. Our network model includes a pose network, a trimap network, a matting network, and a shared encoder to extract features for the above three networks. We also append a trimap refinement module and utilize gradient loss to provide a sharper alpha matte. Extensive experiments have shown that our method outperforms state-of-the-art human matting techniques; the shared encoder leads to better performance and lower memory costs. Our model can process real images downloaded from the Internet for use in composition applications.

References

[1]
X. Chen,; D. Qi,; J. Shen, Boundary-aware network for fast and high-accuracy portrait segmentation. arXiv preprint arXiv:1901.03814, 2019.
[2]
X. Y. Shen,; A. Hertzmann,; J. Y. Jia,; S. Paris,; B. Price,; E. Shechtman,; I. Sachs, Automatic portrait segmentation for image stylization. Computer Graphics Forum Vol. 35, No. 2, 93-102, 2016.
[3]
A. Levin,; D. Lischinski,; Y. Weiss, A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 2, 228-242, 2008.
[4]
Q. F. Chen,; D. Li,; C. K. Tang, KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 35, No. 9, 2175-2188, 2013.
[5]
X. Y. Shen,; X. Tao,; H. Y. Gao,; C. Zhou,; J. Y. Jia, Deep automatic portrait matting. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 92-107, 2016.
[6]
Q. Chen,; T. Z. Ge,; Y. Y. Xu,; Z. Q. Zhang,; X. X. Yang,; K. Gai, Semantic human matting. In: Proceedings of the 26th ACM International Conference on Multimedia, 618-626, 2018.
[7]
N. Xu,; B. Price,; S. Cohen,; T. Huang, Deep image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2970-2979, 2017.
[8]
Y.-Y. Chuang,; B. Curless,; D. H. Salesin,; R. Szeliski, A Bayesian approach to digital matting. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 264-271, 2001.
[9]
J. Wang,; M. F. Cohen, Optimized color sampling for robust matting. In: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 1-8, 2007.
[10]
E. S. L. Gastal,; M. M. Oliveira, Shared sampling for real-time alpha matting. Computer Graphics Forum Vol. 29, No. 2, 575-584, 2010.
[11]
K. He,; C. Rhemann,; C. Rother,; X. Tang,; J. Sun A global sampling method for alpha matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2049-2056, 2011.
[12]
D. Cho,; Y. W. Tai,; I. Kweon, Natural image matting using deep convolutional neural networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 626-643, 2016.
[13]
S. Lutz,; K. Amplianitis,; A. Smolic, Alphagan: Generative adversarial networks for natural image matting. arXiv preprint arXiv:1807.10088, 2018.
[14]
J. W. Tang,; Y. Aksoy,; C. Oztireli,; M. Gross,; T. O. Aydin, Learning-based sampling for natural image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3055-3063, 2019.
[15]
J. Long,; E. Shelhamer,; T. Darrell, Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440, 2015.
[16]
H. S. Zhao,; J. P. Shi,; X. J. Qi,; X. G. Wang,; J. Y. Jia, Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2881-2890, 2017.
[17]
Y. K. Zhang,; L. X. Gong,; L. B. Fan,; P. R. Ren,; Q. X. Huang,; H. J. Bao,; W. Xu, A late fusion CNN for digital matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7469-7478, 2019.
[18]
O. Ronneberger,; P. Fischer,; T. Brox, U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. N. Navab,; J. Hornegger,; W. Wells,; A. Frangi, Eds. Springer Cham, 234-241, 2015.
[19]
L. C. Chen,; G. Papandreou,; I. Kokkinos,; K. Murphy,; A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834-848, 2018.
[20]
J. Carreira,; P. Agrawal,; K. Fragkiadaki,; J. Malik, Human pose estimation with iterative error feedback. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4733-4742, 2016.
[21]
A. Toshev,; C. Szegedy, DeepPose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1653-1660, 2014.
[22]
A. Newell,; K. Y. Yang,; J. Deng, Stacked hourglass networks for human pose estimation. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 483-499, 2016.
[23]
S.-E. Wei,; V. Ramakrishna,; T. Kanade,; Y. Sheikh, Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724-4732, 2016.
[24]
X. Chu,; W. L. Ouyang,; H. S. Li,; X. G. Wang, Structured feature learning for pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4715-4723, 2016.
[25]
X. D. Liang,; K. Gong,; X. H. Shen,; L. Lin, Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 4, 871-885, 2019.
[26]
T. Kikuchi,; Y. Endo,; Y. Kanamori,; T. Hashimoto,; J. Mitani, Transferring pose and augmenting background for deep human-image parsing and its applications. Computational Visual Media Vol. 4, No. 1, 43-54, 2018.
[27]
X. Wu,; R. L. Li,; F. L. Zhang,; J. C. Liu,; J. Wang,; A. Shamir,; S.-M. Hu, Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344-2355, 2020.
[28]
K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[29]
K. M. He,; G. Gkioxari,; P. Dollar,; R. Girshick, Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2961-2969, 2017.
[30]
T. Y. Lin,; P. Dollar,; R. Girshick,; K. M. He,; B. Hariharan,; S. Belongie, Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2117-2125, 2017.
[31]
Z. Cao,; G. Hidalgo,; T. Simon,; S. E. Wei,; Y. Sheikh, OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008, 2018.
[32]
T. Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollár,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B.. Schiele,; T. Tuytelaars Eds. Springer Cham, 740-755, 2014.
[33]
C. Rhemann,; C. Rother,; J. Wang,; M. Gelautz,; P. Kohli,; P. Rott, A perceptually motivated online benchmark for image matting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1826-1833, 2009.
Computational Visual Media
Pages 215-224
Cite this article:
Wu X, Fang X-N, Chen T, et al. JMNet: A joint matting network for automatic human matting. Computational Visual Media, 2020, 6(2): 215-224. https://doi.org/10.1007/s41095-020-0168-6
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return