| Sign up

PDF (9.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

DeepFaceReshaping: Interactive deep face reshaping via landmark manipulation

Shu-Yu Chen^¹, Yue-Ren Jiang^{¹^,²}, Hongbo Fu^³, Xinyang Han^², Zitao Liu^{⁴^,⁵}, Rong Li^⁶, Lin Gao^{¹^,²}

()

1

Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

2

University of Chinese Academy of Sciences, Beijing 100049, China

3

School of Creative Media, City University of Hong Kong, Hong Kong, China

4

TAL Education Group, Beijing 100081, China

5

Guangdong Institute of Smart Education, Jinan University, Guangzhou 510632, China

6

Zhejiang Lab Nanhu Headquarters, Hangzhou 310023

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Deep generative models allow the synthesis of realistic human faces from freehand sketches or semantic maps. However, although they are flexible, sketches and semantic maps provide too much freedom for manipulation, and thus, are not easy for novice users to control. In this study, we present DeepFaceReshaping, a novel landmark-based deep generative framework for interactive face reshaping. To edit the shape of a face realistically by manipulating a small number of face landmarks, we employ neural shape deformation to reshape individual face components. Furthermore, we propose a novel Transformer-based partial refinement network to synthesize the reshaped face components conditioned on the edited landmarks, and fuse the components to generate the entire face using a local-to-global approach. In this manner, we limit possible reshaping effects within a feasible component-based face space. Thus, our interface is intuitive even for novice users, as confirmed by a user study. Our experiments demonstrate that our method outperforms traditional warping-based approaches and recent deep generative techniques.

Keywords

face reshaping deep generative model interactive editing

Electronic Supplementary Material

Video

cvm-10-5-949-ESM1.mp4

Download File(s)

cvm-10-5-949-ESM2.pdf (19 MB)

References

[1]

Gu, S.; Bao, J.; Yang, H.; Chen, D.; Wen, F.; Yuan, L. Mask-guided portrait editing with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3436–3445, 2019.

[2]

Lee, C. H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5549–5558, 2020.

[3]

Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5104–5113, 2020.

[4]

Portenier, T.; Hu, Q.; Szabó, A.; Bigdeli, S. A.; Favaro, P.; Zwicker, M. FaceShop: Deep sketch-based face image editing. arXiv preprint arXiv: 1804.08972, 2018.

[5]

Jo, Y.; Park, J. SC-FEGAN: Face editing generative adversarial network with user's sketch and color. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1745–1753, 2019.

[6]

Blanz, V.; Vetter, T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187–194, 1999.

[7]

Li, H.; Weise, T.; Pauly, M. Example-based facial rigging. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 32, 2010.

Crossref Google Scholar

[8]

Chen, S. Y.; Su, W.; Gao, L.; Xia, S.; Fu, H. DeepFaceDrawing: Deep generation of face images from sketches. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 72, 2020.

Crossref Google Scholar

[9]

Han, X.; Gao, C.; Yu, Y. DeepSketch2Face: A deep learning based sketching system for 3D face and caricature modeling. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 126, 2017.

Crossref Google Scholar

[10]

Zhou, S.; Fu, H.; Liu, L.; Cohen-Or, D.; Han, X. Parametric reshaping of human bodies in images. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 126, 2010.

Crossref Google Scholar

[11]

Litany, O.; Bronstein, A.; Bronstein, M.; Makadia, A. Deformable shape completion with graph convolutional autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1886–1895, 2018.

[12]

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems, 2672–2680, 2014.

[13]

Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784, 2014.

[14]

Isola, P.; Zhu, J. Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1125–1134, 2017.

[15]

Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.

[16]

Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2337–2346, 2019.

[17]

Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Liu, G.; Tao, A.; Kautz, J.; Catanzaro, B. Video-to-video synthesis. arXiv preprint arXiv: 1808.06601, 2018.

[18]

Wang, T. C.; Liu, M. Y.; Tao, A.; Liu, G.; Kautz, J.; Catanzaro, B. Few-shot video-to-video synthesis. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, 5013–5024, 2019.

[19]

Sangkloy, P.; Lu, J.; Fang, C.; Yu, F.; Hays, J. Scribbler: Controlling deep image synthesis with sketch and color. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5400–5409, 2017.

[20]

Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4401–4410, 2019.

[21]

Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8110–8119, 2020.

[22]

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L. U.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.

[23]

Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.; Shazeer, N.; Ku, A.; Tran, D. Image transformer. In: Proceedings of the 35th International Conference on Machine Learning, 4055–4064, 2018.

[24]

Jiang, Y.; Chang, S.; Wang, Z. TransGAN: Two pure transformers can make one strong GAN, and that can scale up. arXiv preprint arXiv: 2102.07074, 2021.

[25]

Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. arXiv preprint arXiv: 2012.09841, 2020.

[26]

Hudson, D. A.; Zitnick, C. L. Generative adversarial transformers. arXiv preprint arXiv: 2103.01209, 2021.

[27]

Leyvand, T.; Cohen-Or, D.; Dror, G.; Lischinski, D. Data-driven enhancement of facial attractiveness. In: Proceedings of the ACM SIGGRAPH Papers, Article No. 38, 2008.

[28]

Tang, X.; Sun, W.; Yang, Y. L.; Jin, X. Parametric reshaping of portraits in videos. In: Proceedings of the 29th ACM International Conference on Multimedia, 4689–4697, 2021.

[29]

Kaufmann, P.; Wang, O.; Sorkine-Hornung, A.; Sorkine-Hornung, O.; Smolic, A.; Gross, M. Finite element image warping. Computer Graphics Forum Vol. 32, No. 2pt1, 31–39, 2013.

Crossref Google Scholar

[30]

Xiao, Q.; Tang, X.; Wu, Y.; Jin, L.; Yang, Y. L.; Jin, X. Deep shapely portraits. In: Proceedings of the 28th ACM International Conference on Multimedia, 1800–1808, 2020.

[31]

Averbuch-Elor, H.; Cohen-Or, D.; Kopf, J.; Cohen, M. F. Bringing portraits to life. ACM Transactions on Graphics Vol. 36, No. 6, Article No. 196, 2017.

Crossref Google Scholar

[32]

Han, X.; Hou, K.; Du, D.; Qiu, Y.; Cui, S.; Zhou, K.; Yu, Y. CaricatureShop: Personalized and photorealistic caricature sketching. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 7, 2349–2361, 2020.

Crossref Google Scholar

[33]

Zhao, L.; Han, F.; Peng, X.; Zhang, X.; Kapadia, M.; Pavlovic, V.; Metaxas, D. N. Cartoonish sketch-based face editing in videos using identity deformation transfer. arXiv preprint arXiv: 1703.08738, 2017.

[34]

Xiao, T.; Hong, J.; Ma, J. ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes. In: Proceedings of the European Conference on Computer Vision, 168–184, 2018.

[35]

Shen, Y.; Gu, J.; Tang, X.; Zhou, B. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9243–9252, 2020.

[36]

Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. arXiv preprint arXiv: 2008.00951, 2020.

[37]

Pidhorskyi, S.; Adjeroh, D. A.; Doretto, G. Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14104–14113, 2020.

[38]

Alaluf, Y.; Patashnik, O.; Cohen-Or, D. Only a matter of style: Age transformation using a style-based regression model. arXiv preprint arXiv: 2102.02754, 2021.

[39]

Yang, S.; Wang, Z.; Liu, J.; Guo, Z. Deep plastic surgery: Robust and controllable image editing with human-drawn sketches. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12360. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. Eds. Springer Cham, 601–617, 2020.

[40]

Chen, S. Y.; Liu, F. L.; Lai, Y. K.; Rosin, P. L.; Li, C.; Fu, H.; Gao, L. DeepFaceEditing: Deep generation of face images from sketches. ACM Transactions on Graphics Vol 40, No. 4, Article No. 90, 2021.

Crossref Google Scholar

[41]

Schaefer, S.; McPhail, T.; Warren, J. Image deformation using moving least squares. In: Proceedings of the ACM SIGGRAPH Papers, 533–540, 2006.

[42]

Tan, Z.; Chai, M.; Chen, D.; Liao, J.; Chu, Q.; Yuan, L.; Tulyakov, S.; Yu, N. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 95, 2020.

Crossref Google Scholar

[43]

Zakharov, E.; Shysheya, A.; Burkov, E.; Lempitsky, V. Few-shot adversarial learning of realistic neural talking head models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 9459–9468, 2019.

[44]

Gong, X.; Chen, W.; Chen, T.; Wang, Z. Sandwich batch normalization: A drop-In replacement for feature distribution heterogeneity. arXiv preprint arXiv: 2102.11382, 2021.

[45]

Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W. M.; Frangi, A. F. Eds. Springer Cham, 234–241, 2015.

[46]

Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. ArcFace: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4690–4699, 2019.

[47]

Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing Vol. 13, No. 4, 600–612, 2004.

Crossref Google Scholar

[48]

Face++. Available at https://www.faceplusplus.com/dense-facial-landmarks/

[49]

Zhang, P.; Zhang, B.; Chen, D.; Yuan, L.; Wen, F. Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5143–5153, 2020.

[50]

Zakharov, E.; Ivakhnenko, A.; Shysheya, A.; Lempitsky, V. Fast bi-layer neural synthesis of one-shot realistic head avatars. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12357. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. Eds. Springer Cham, 524–540, 2020.

[51]

Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 6626–6637, 2017.

[52]

Huang, Y.; Wang, Y.; Tai, Y.; Liu, X.; Shen, P.; Li, S.; Li, J.; Huang, F. CurricularFace: Adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5901–5910, 2020.

[53]

Horé, A.; Ziou, D. Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure? IET Image Processing Vol. 7, No. 1, 12–24, 2013.

Crossref Google Scholar

[54]

Wang, Z.; Bovik, A. C. Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine Vol. 26, No. 1, 98–117, 2009.

Crossref Google Scholar

Computational Visual Media

Volume 10 Issue 5,
October 2024

Pages 949-963

DOI: 10.1007/s41095-023-0373-1

Cite this article:

Chen S-Y, Jiang Y-R, Fu H, et al. DeepFaceReshaping: Interactive deep face reshaping via landmark manipulation. Computational Visual Media, 2024, 10(5): 949-963. https://doi.org/10.1007/s41095-023-0373-1

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号