AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (23.8 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Mask-aware photorealistic facial attribute manipulation

Shanghai Jiao Tong University, Shanghai 200240, China
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Show Author Information

Abstract

The technique of facial attribute manipulation has found increasing application, but it remains challenging to restrict editing of attributes so that a face’s unique details are preserved. In this paper, we introduce our method, which we call amask-adversarialautoencoder (M-AAE). It combines a variational autoencoder (VAE) and a generative adversarial network (GAN) for photorealistic image generation. We use partial dilated layers to modify a few pixels in the feature maps of an encoder, changing the attribute strength continuously without hindering global information. Our training objectives for the VAE and GAN are reinforced by supervision of face recognition loss and cycle consistency loss, to faithfully preserve facial details. Moreover, we generate facial masks to enforce background consistency, which allows our training to focus on the foreground face rather than the background. Experimental results demonstrate that our method can generate high-quality images with varying attributes, and outperforms existing methods in detail preservation.

References

[1]
Park, U.; Tong, Y. Y.; Jain, A. K. Age-invariant face recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 5, 947–954, 2010.
[2]
Duong, C. N.; Quach, K. G.; Luu, K.; Le, T. H. N.; Savvides, M. Temporal non-volume preserving approach to facial age-progression and age-invariant face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, 3755–3763, 2017.
[3]
Zhang, G.; Kan, M. N.; Shan, S. G.; Chen, X. L. Generative adversarial network with spatial attention for face attribute editing. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11210. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 422–437, 2018.
[4]
Qian, S.; Lin, K.; Wu, W.; Liu, Y.; Wang, Q.; Shen, F.; Qian, C.; He, R. Make a face: Towards arbitrary high fidelity face manipulation. In: Proceedings of the International Conference on Computer Vision, 10033–10042, 2019.
[5]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y.; Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2672–2680, 2014.
[6]
Zhou, W. Y.; Yang, G. W.; Hu, S. M. Jittor-GAN: A fast-training generative adversarial network model zoo based on Jittor. Computational Visual Media Vol. 7, No. 1, 153–157, 2021.
[7]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
[8]
Yang, J.; Kannan, A.; Batra, D.; Parikh, D. LRGAN: Layered recursive generative adversarial networks forimage generation. In: Proceedings of the International Conference on Learning Representations, 2017.
[9]
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
[10]
Chen, Y. C.; Shen, X. H.; Lin, Z.; Lu, X.; Pao, I. M.; Jia, J. Y. Semantic component decomposition for face attribute manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9851–9859, 2019.
[11]
Liu, M.; Ding, Y.; Xia, M.; Liu, X.; Ding, E.; Zuo, W.; Wen, S. STGAN: A unified selective transfer network for arbitrary image attribute editing. In: Proceedings of the Computer Vision and Pattern Recognition, 3673–3682, 2019.
[12]
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414–2423, 2016.
[13]
Li, Y. H.; Wang, N. Y.; Liu, J. Y.; Hou, X. D. Demystifying neural style transfer. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2230–2236, 2017.
[14]
Zhang, Z. F.; Song, Y.; Qi, H. R. Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5810–5818 2017.
[15]
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
[16]
Liu, M. Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 443–449, 2017.
[17]
Shen, W.; Liu, R. J. Learning residual images for face attribute manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1225–1233, 2017.
[18]
Lample, G.; Zeghidour, N.; Usunier, N.; Bordes, A.; Denoyer, L.; Ranzato, M. Fader networks: Manipulating images by sliding attributes. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 5969–5978, 2017.
[19]
Larsen, A.; Sønderby S.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the International Conference on Machine Learning, 1558–1566, 2016.
[20]
He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X Arbitrary facial attribute editing: Only change what you want. arXiv preprint arXiv:1711.10678, 2017.
[21]
He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X. AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing Vol. 28, No. 11, 5464–5478, 2019.
[22]
Chen, P.; Xiao, Q.; Xu, J.; Dong, X. L.; Sun, L. J. Facial attribute editing using semantic segmentation. In: Proceedings of the International Conference on High Performance Big Data and Intelligent Systems, 97–103, 2019.
[23]
Bahng, H.; Chung, S.; Yoo, S.; Choo, J. Exploring unlabeled faces for novel attribute discovery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5820–5829, 2020.
[24]
Gauthier, J. Conditional generative adversarial nets for convolutional face generation. 2014. Available at http://cs231n.stanford.edu/reports/2015/pdfs/jgauthie_final_report.pdf.
[25]
Perarnau, G.; Joost, V.; Raducanu, B.; Alvarez, J. Invertible conditional GANs for image editing. In: Proceedings of the Advances in Neural Information Processing Systems, 2016.
[26]
Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations, 2014.
[27]
Suwajanakorn, S.; Kemelmacher-Shlizerman, I.; Seitz, S. M. Total moving face reconstruction. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8692. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 796–812, 2014.
[28]
Hou, X. X.; Shen, L. L.; Sun, K.; Qiu, G. P. Deep feature consistent variational autoencoder. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1133–1141, 2017.
[29]
Richardson, E.; Sela, M. T.; Or-El, R.; Kimmel, R. Learningdetailed face reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5553–5562, 2017.
[30]
Zhu, W. B.; Wu, H. T.; Chen, Z. Y.; Vesdapunt, N.; Wang, B. Y. ReDA: Reinforced differentiable attribute for 3D face reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4957–4966, 2020.
[31]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2016.
[32]
Tseng, H. Y.; Lee, H. Y.; Jiang, L.; Yang, M. H.; Yang, W. L. RetrieveGAN: Image synthesis via differentiable patch retrieval. In:Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12353. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 242–257, 2020.
[33]
Kim, T.; Cha, M.; Kim, H.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 1857–1865, 2017.
[34]
Shen, Y. J.; Gu, J. J.; Tang, X. O.; Zhou, B. L. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9240–9249, 2020.
[35]
Oren, K.; Dani, L.; Cohen-Or, D. Cross-domain cascaded deep translation. In: Computer Vision – ECCV2020. Lecture Notes in Computer Science, Vol. 12347. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 673–689, 2020.
[36]
Zhang, Z.; Song, Y.; Qi, H. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
[37]
Wang, C.; Zheng, H. Y.; Yu, Z. B.; Zheng, Z. Q.; Gu, Z. R.; Zheng, B. Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European Conference on Computer Vision, 770–785, 2018.
[38]
Kim, H.; Garrido, P.; Tewari, A.; Xu, W. P.; Thies, J.; Niessner, M.; Pérez, P.; Richardt, C.; Zollhöfer, M.; Theobalt, C. Deep video portraits. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 163, 2018.
[39]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
[40]
Parkhi, O. M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In: Proceedings of the British Machine Vision Conference, 41.1–41.12, 2015.
[41]
Liu, Z. W.; Luo, P.; Wang, X. G.; Tang, X. O. Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, 3730–3738, 2015.
[42]
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, 2015.
Computational Visual Media
Pages 363-374
Cite this article:
Sun R, Huang C, Zhu H, et al. Mask-aware photorealistic facial attribute manipulation. Computational Visual Media, 2021, 7(3): 363-374. https://doi.org/10.1007/s41095-021-0219-7

636

Views

41

Downloads

11

Crossref

9

Web of Science

12

Scopus

2

CSCD

Altmetrics

Received: 30 December 2020
Accepted: 25 February 2021
Published: 28 April 2021
© The Author(s) 2021

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.

Return