| Sign up

PDF (10.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

Taming diffusion model for exemplar-based image translation

Hao Ma^¹, Jingyuan Yang^¹, Hui Huang^¹()

1

Visual Computing Research Center, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

Show Author Information

Graphical Abstract

View original image Download original image

Abstract

Exemplar-based image translation involves converting semantic masks into photorealistic images that adopt the style of a given exemplar. However, most existing GAN-based translation methods fail to produce photorealistic results. In this study, we propose a new diffusion model-based approach for generating high-quality images that are semantically aligned with the input mask and resemble an exemplar in style. The proposed method trains a conditional denoising diffusion probabilistic model (DDPM) with a SPADE module to integrate the semantic map. We then used a novel contextual loss and auxiliary color loss to guide the optimization process, resulting in images that were visually pleasing and semantically accurate. Experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual quality and quantitative metrics.

Keywords

exemplar image translation denoising diffusion probabilistic model (DDPM)

References

[1]

Zhang, P.; Zhang, B.; Chen, D.; Yuan, L.; Wen, F. Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5143–5153, 2020.

[2]

Isola, P.; Zhu, J. Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.

[3]

Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2223–2232, 2017.

[4]

Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.

[5]

Preechakul, K.; Chatthee, N.; Wizadwongsa, S.; Suwajanakorn, S. Diffusion autoencoders: Toward a meaningful and decodable representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10619–10629, 2022.

[6]

Kim, G.; Kwon, T.; Ye, J. C. DiffusionCLIP: Text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2426–2435, 2022.

[7]

Avrahami, O.; Lischinski, D.; Fried, O. Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18208–18218, 2022.

[8]

Liu, X.; Park, D. H.; Azadi, S.; Zhang, G.; Chopikyan, A.; Hu, Y.; Shi, H.; Rohrbach, A.; Darrell, T. More control for free image synthesis with semantic diffusion guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 289–299, 2023.

[9]

Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2287–2296, 2021.

[10]

Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5104–5113, 2020.

[11]

Kim, T.; Cha, M.; Kim, M.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, 1857–1865, 2017.

[12]

Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, 2849–2857, 2017.

[13]

Liu, M.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 700–708, 2017.

[14]

Royer, A.; Bousmalis, K.; Gouws, S.; Bertsch, F.; Mosseri, I.; Cole, F.; Murphy, K. XGAN: Unsuper- vised image-to-image translation for many-to-many mappings. In: Domain Adaptation for Visual Understanding. Singh, R.; Vatsa, M.; Patel, V.; Ratha, N. Springer Cham, 33–49, 2020.

[15]

Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.

[16]

Ma, L.; Jia, X.; Georgoulis, S.; Tuytelaars, T.; Gool, L. Exemplar guided unsupervised image-to-image translation. In: Proceedings of the International Conference on Learning Representations, 2019.

[17]

Qi, X.; Chen, Q.; Jia, J.; Koltun, V. Semi-parametric image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8808–8816, 2018.

[18]

Wang, M.; Yang, G. Y.; Li, R.; Liang, R. Z.; Zhang, S. H.; Hall, P. M.; Hu, S. M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.

[19]

Zhan, F.; Yu, Y.; Cui, K.; Zhang, G.; Lu, S.; Pan, J.; Zhang, C.; Ma, F.; Xie, X.; Miao, C. Unbalanced feature transport for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15028–15038, 2021.

[20]

Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In: Proceedings of the 34th Conference on Neural Information Processing Systems, 6840–6851, 2020.

[21]

Dhariwal, P.; Nichol, A. Diffusion models beat GANs on image synthesis. In: Proceedings of the 35th Conference on Neural Information Processing Systems, 8780–8794, 2021.

[22]

Ho, J.; Salimans, T. Classifier-free diffusion guidance. In: Proceedings of the Conference on Neural Information Processing Systems, 2021.

[23]

Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. In: Proceedings of the International Conference on Learning Representations, 2021.

[24]

Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Lecture Notes in Computer Science, Vol. 9351. Navab, N.; Hornegger, J.; Wells, W. M.; Frangi, A. F. Eds, Springer Cham, 234–241, 2015.

[25]

Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv: 2112.10741, 2021.

[26]

Xiao, Z.; Kreis, K.; Vahdat, A. Tackling the generative learning trilemma with denoising diffusion GANs. In: Proceedings of the International Conference on Learning Representations, 2022.

[27]

Wang, W.; Bao, J.; Zhou, W.; Chen, D.; Chen, D.; Yuan, L.; Li, H. Semantic image synthesis via diffusion models. arXiv preprint arXiv: 2207.00050, 2022.

[28]

Mechrez, R.; Talmi, I.; Zelnik-Manor, L. The contextual loss for image transformation with non-aligned data. In: Proceedings of the European Conference on Computer Vision, 768–783, 2018.

[29]

Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.

[30]

Lee, C. H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5549–5558, 2020.

[31]

Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, 3730–3738, 2015.

[32]

Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.

[33]

Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223, 2016.

[34]

Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633–641, 2017.

[35]

Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D. J.; Norouzi, M. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 45, No. 4, 4713–4726, 2023.

[36]

Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E. L.; Ghasemipour, K.; Lopes, R. G.; Ayan, B. K.; Salimans, T.; et al. Photorealistic text-to-image diffusion models with deep language understanding. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 36479–36494, 2022.

[37]

Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision, 325–341, 2018.

[38]

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684–10695, 2022.

Computational Visual Media

Volume 10 Issue 6,
December 2024

Pages 1031-1043

DOI: 10.1007/s41095-023-0371-3

Cite this article:

Ma H, Yang J, Huang H. Taming diffusion model for exemplar-based image translation. Computational Visual Media, 2024, 10(6): 1031-1043. https://doi.org/10.1007/s41095-023-0371-3

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号