AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (10.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Taming diffusion model for exemplar-based image translation

Visual Computing Research Center, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Show Author Information

Graphical Abstract

Abstract

Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar. However, most existing GAN-based translation methods fail to produce photorealistic results. In this study, we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in style. The proposed method trains a conditional denoising diffusion probabilistic model (DDPM) with a SPADE module to integrate the semantic map. We then used a novel contextual loss and auxiliary color loss to guide the optimization process, resulting in images that were visually pleasing and semantically accurate. Experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics.

References

[1]
Zhang, P.; Zhang, B.; Chen, D.; Yuan, L.; Wen, F. Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5143–5153, 2020.
[2]
Isola, P.; Zhu, J. Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
[3]
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2223–2232, 2017.
[4]
Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
[5]
Preechakul, K.; Chatthee, N.; Wizadwongsa, S.; Suwajanakorn, S. Diffusion autoencoders: Toward a meaningful and decodable representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10619–10629, 2022.
[6]
Kim, G.; Kwon, T.; Ye, J. C. DiffusionCLIP: Text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2426–2435, 2022.
[7]
Avrahami, O.; Lischinski, D.; Fried, O. Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18208–18218, 2022.
[8]
Liu, X.; Park, D. H.; Azadi, S.; Zhang, G.; Chopikyan, A.; Hu, Y.; Shi, H.; Rohrbach, A.; Darrell, T. More control for free image synthesis with semantic diffusion guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 289–299, 2023.
[9]
Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2287–2296, 2021.
[10]
Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5104–5113, 2020.
[11]
Kim, T.; Cha, M.; Kim, M.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 1857–1865, 2017.
[12]
Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, 2849–2857, 2017.
[13]
Liu, M.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 700–708, 2017.
[14]
Royer, A.; Bousmalis, K.; Gouws, S.; Bertsch, F.; Mosseri, I.; Cole, F.; Murphy, K. XGAN: Unsuper- vised image-to-image translation for many-to-many mappings. In: Domain Adaptation for Visual Understanding. Singh, R.; Vatsa, M.; Patel, V.; Ratha, N. Springer Cham, 33–49, 2020.
[15]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
[16]
Ma, L.; Jia, X.; Georgoulis, S.; Tuytelaars, T.; Gool, L. Exemplar guided unsupervised image-to-image translation. In: Proceedings of the International Conference on Learning Representations, 2019.
[17]
Qi, X.; Chen, Q.; Jia, J.; Koltun, V. Semi-parametric image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8808–8816, 2018.
[18]
Wang, M.; Yang, G. Y.; Li, R.; Liang, R. Z.; Zhang, S. H.; Hall, P. M.; Hu, S. M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.
[19]
Zhan, F.; Yu, Y.; Cui, K.; Zhang, G.; Lu, S.; Pan, J.; Zhang, C.; Ma, F.; Xie, X.; Miao, C. Unbalanced feature transport for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15028–15038, 2021.
[20]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In: Proceedings of the 34th Conference on Neural Information Processing Systems, 6840–6851, 2020.
[21]
Dhariwal, P.; Nichol, A. Diffusion models beat GANs on image synthesis. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 8780–8794, 2021.
[22]
Ho, J.; Salimans, T. Classifier-free diffusion guidance. In: Proceedings of the Conference on Neural Information Processing Systems, 2021.
[23]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. In: Proceedings of the International Conference on Learning Representations, 2021.
[24]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W. M.; Frangi, A. F. Eds, Springer Cham, 234–241, 2015.
[25]
Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv: 2112.10741, 2021.
[26]
Xiao, Z.; Kreis, K.; Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In: Proceedings of the International Conference on Learning Representations, 2022.
[27]
Wang, W.; Bao, J.; Zhou, W.; Chen, D.; Chen, D.; Yuan, L.; Li, H. Semantic image synthesis via diffusion models. arXiv preprint arXiv: 2207.00050, 2022.
[28]
Mechrez, R.; Talmi, I.; Zelnik-Manor, L. The contextual loss for image transformation with non-aligned data. In: Proceedings of the European Conference on Computer Vision, 768–783, 2018.
[29]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.
[30]
Lee, C. H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5549–5558, 2020.
[31]
Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, 3730–3738, 2015.
[32]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
[33]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223, 2016.
[34]
Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633–641, 2017.
[35]
Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D. J.; Norouzi, M. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 45, No. 4, 4713–4726, 2023.
[36]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E. L.; Ghasemipour, K.; Lopes, R. G.; Ayan, B. K.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 36479–36494, 2022.
[37]
Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision, 325–341, 2018.
[38]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695, 2022.
Computational Visual Media
Pages 1031-1043
Cite this article:
Ma H, Yang J, Huang H. Taming diffusion model for exemplar-based image translation. Computational Visual Media, 2024, 10(6): 1031-1043. https://doi.org/10.1007/s41095-023-0371-3

61

Views

4

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 13 March 2023
Accepted: 20 July 2023
Published: 24 July 2024
© The Author(s) 2024.

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return