AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (16.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Controllable multi-domain semantic artwork synthesis

Department of Computer Science, University ofTsukuba, Tsukuba 305-8577, Japan
Department of Computer Science and Engineering, Waseda University, Tokyo 169-8050, Japan
Show Author Information

Graphical Abstract

Abstract

We present a novel framework for the multi-domain synthesis of artworks from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains, with their corresponding semantic label maps. We first extracted semantic maps from landscape photography and used a conditional generative adversarial network (GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training data. Furthermore, we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain synthesis. Subsequently, the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style, which we call spatially style-adaptive normalization (SSTAN). Compared to the previous methods, which only take semantic layout as the input, our model jointly learns style and semantic information representation, improving the generation quality of artistic images. These results indicate that our model learned to separate the domains in the latent space. Thus, we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different domains. Moreover, by combining the proposed dataset and approach, we generated user-controllable artworks of higher quality than that of existing approaches, as corroborated by quantitative metrics and a user study.

Electronic Supplementary Material

Video
41095_0356_ESM(1).mp4
Download File(s)
41095_0356_ESM(2).pdf (70.4 MB)

References

[1]
Hertzmann, A. Can computers create art? Arts Vol. 7, No. 2, 18, 2018.
[2]
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414–2423, 2016.
[3]
Tan, W. R.; Chan, C. S.; Aguirre, H. E.; Tanaka, K. ArtGAN: Artwork synthesis with conditional categorical GANs. In: Proceedings of the IEEE International Conference on Image Processing, 3760–3764, 2017.
[4]
Elgammal, A.; Liu, B.; Elhoseiny, M.; Mazzone, M. CAN: Creative adversarial networks, generating “art” by learning about styles and deviating from style norms. arXiv preprint arXiv:1706.07068, 2017.
[5]
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
[6]
Isola, P.; Zhu, J. Y.; Zhou, T. H.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
[7]
Liu, B. C.; Song, K. P.; Zhu, Y. Z.; Elgammal, A. Sketch-to-art: Synthesizing stylized art images from sketches. In: Computer Vision – ACCV 2020. Lecture Notes in Computer Science, Vol. 12627. Ishikawa, H.; Liu, C. L.; Pajdla, T.; Shi, J. Eds. Springer Cham, 207–222, 2021.
[8]
Men, Y. F.; Lian, Z. H.; Tang, Y. M.; Xiao, J. G. A common framework for interactive texture transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6353–6362, 2018.
[9]
Champandard, A. J. Semantic style transfer and turning two-bit doodles into fine artworks. arXiv preprint arXiv:1603.01768, 2016.
[10]
Park, T.; Liu, M. Y.; Wang, T. C.; Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
[11]
Zhao, S. Y.; Cui, J.; Sheng, Y. L.; Dong, Y.; Liang, X.; Chang, E. I.; Xu, Y. Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428, 2021.
[12]
Sushko, V.; Schönfeld, E.; Zhang, D.; Gall, J.; Schiele, B.; Khoreva, A. You only need adversarial supervision for semantic image synthesis. arXiv preprint arXiv:2012.04781, 2020.
[13]
Zhu, Z.; Xu, Z. L.; You, A. S.; Bai, X. Semantically multi-modal image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5466–5475, 2020.
[14]
Zhu, P. H.; Abdal, R.; Qin, Y. P.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5103–5112, 2020.
[15]
Hertzmann, A.; Jacobs, C. E.; Oliver, N.; Curless, B.; Salesin, D. H. Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, 327–340, 2001.
[16]
Dehlinger, H. On fine art and generative line drawings. Journal of Mathematics and the Arts Vol. 1, No. 2, 97–111, 2007.
[17]
Phon-Amnuaisuk, S.; Panjapornpon, J. Controlling generative processes of generative art somnuk phon-. Procedia Computer Science Vol. 13, 43–52, 2012.
[18]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.
[19]
Tan, W. R.; Chan, C. S.; Aguirre, H. E.; Tanaka, K. Improved ArtGAN for conditional synthesis of natural image and artwork. IEEE Transactions on Image Processing Vol. 28, No. 1, 394–409, 2019.
[20]
Xue, A. End-to-end Chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 3862–3870, 2023.
[21]
Dobler, K.; Hübscher, F.; Westphal, J.; Sierra-Múnera, A.; de Melo, G.; Krestel, R. Art creation with multi-conditional StyleGANs. arXiv preprint arXiv:2202.11777, 2022.
[22]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
[23]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10674–10685, 2022.
[24]
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 1097–1105, 2012.
[25]
Dumoulin, V.; Shlens, J.; Kudlur, M. A learned representation for artistic style. arXiv preprint arXiv:1610.07629, 2016.
[26]
Li, Y. H.; Wang, N. Y.; Liu, J. Y.; Hou, X. D. Demystifying neural style transfer. arXiv preprint arXiv:1701.01036, 2017.
[27]
Yin, R. J. Content aware neural style transfer. arXiv preprint arXiv:1601.04568, 2016.
[28]
Gatys, L. A.; Bethge, M.; Hertzmann, A.; Shechtman, E. Preserving color in neural artistic style transfer. arXiv preprint arXiv:1606.05897, 2016.
[29]
Kolkin, N.; Salavon, J.; Shakhnarovich, G. Style transfer by relaxed optimal transport and self-similarity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10043–10052, 2019.
[30]
Kusner, M. J.; Sun, Y.; Kolkin, N. I.; Weinberger, K. Q. From word embeddings to document distances. In: Proceedings of the 32nd International Conference on Machine Learning, 957–966, 2015.
[31]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, 1510–1519, 2017.
[32]
Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesisand semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.
[33]
Yi, Z. L.; Zhang, H.; Tan, P.; Gong, M. L. DualGAN: Unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, 2868–2876, 2017.
[34]
Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J. Y.; Isola, P.; Saenko, K.; Efros, A. A.; Darrell, T. CyCADA: Cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th International Conference on Machine Learning, 1989–1998, 2018.
[35]
Zhu, J. Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A. A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 465–476, 2017.
[36]
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
[37]
Lee, H. Y.; Tseng, H. Y.; Huang, J. B.; Singh, M.; Yang, M. H. Diverse image-to-image translation via disentangled representations. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11205. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 36–52, 2018.
[38]
Lee, H. Y.; Tseng, H. Y.; Mao, Q.; Huang, J. B.; Lu, Y. D.; Singh, M.; Yang, M. H. DRIT++: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision Vol. 128, Nos. 10–11, 2402–2417, 2020.
[39]
Liu, M. Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T. M.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 10550–10559, 2019.
[40]
Chen, X. Y.; Xu, C.; Yang, X. K.; Song, L.; Tao, D. C. Gated-GAN: Adversarial gated networks for multi-collection style transfer. IEEE Transactions on Image Processing Vol. 28, No. 2, 546–560, 2019.
[41]
Chang, H. Y.; Wang, Z. X.; Chuang, Y. Y. Domain-specific mappings for generative adversarial style transfer. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12353. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 573–589, 2020.
[42]
Park, T.; Zhu, J.-Y.; Wang, O.; Lu, J.; Shechtman, E.; Efros, A. A.; Zhang, R. Swapping autoencoder for deep image manipulation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 604, 7198–7211, 2020.
[43]
He, B.; Gao, F.; Ma, D. Q.; Shi, B. X.; Duan, L. Y. ChipGAN: A generative adversarial network for Chinese ink wash painting style transfer. In: Proceedings of the 26th ACM International Conference on Multimedia, 1172–1180, 2018.
[44]
Chen, L. C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 40, No. 4, 834–848, 2018.
[45]
Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; Liu, G. L.; Tao, A.; Kautz, J.; Catanzaro, B. Video-to-video synthesis. arXiv preprint arXiv:1808.06601, 2018.
[46]
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Kim, S.; Choo, J. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
[47]
Qi, X. J.; Chen, Q. F.; Jia, J. Y.; Koltun, V. Semi-parametric image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8808–8816, 2018.
[48]
Wang, M.; Yang, G. Y.; Li, R. L.; Liang, R. Z.; Zhang, S. H.; Hall, P. M.; Hu, S. M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.
[49]
Zhang, S. Y.; Liang, R. Z.; Wang, M. ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks. Computational Visual Media Vol. 5, No. 1, 105–115, 2019.
[50]
Zhou, W. Y.; Yang, G. W.; Hu, S. M. Jittor-GAN: A fast-training generative adversarial network model zoo based on Jittor. Computational Visual Media Vol. 7, No. 1, 153–157, 2021.
[51]
Wang, C.; Tang, F.; Zhang, Y.; Wu, T. R.; Dong, W. M. Towards harmonized regional style transfer and manipulation for facial images. Computational Visual Media Vol. 9, No. 2, 351–366, 2023.
[52]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. M. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116, 2020.
[53]
Cohen, N.; Newman, Y.; Shamir, A. Semantic segmentation in art paintings. Computer Graphics Forum Vol. 41, No. 2, 261–275, 2022.
[54]
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision – ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
[55]
Zhou, B. L.; Zhao, H.; Puig, X.; Xiao, T. T.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision Vol. 127, No. 3, 302–321, 2019.
[56]
Boykov, Y.; Veksler, O.; Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 11, 1222–1239, 2001.
[57]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[58]
Xie, S. N.; Tu, Z. W. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, 1395–1403, 2015.
[59]
Gu, S.; Bao, J.; Chen, D.; Wen, F. GIQA: Generated image quality assessment. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12356. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 369–385, 2020.
[60]
Fu, J.; Liu, J.; Tian, H. J.; Li, Y.; Bao, Y. J.; Fang, Z. W.; Lu, H. Q. Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3141–3149, 2019.
[61]
Shen, Y. J.; Gu, J. J.; Tang, X. O.; Zhou, B. L. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9240–9249, 2020.
[62]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
[63]
Johnson, J.; Alahi, A.; Li, F. F. Perceptual losses for real-time style transfer and super-resolution. In: Computer Vision – ECCV 2016. Lecture Notes in Computer Science, Vol. 9906. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 694–711, 2016.
[64]
Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
[65]
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
Computational Visual Media
Pages 355-373
Cite this article:
Huang Y, Iizuka S, Simo-Serra E, et al. Controllable multi-domain semantic artwork synthesis. Computational Visual Media, 2024, 10(2): 355-373. https://doi.org/10.1007/s41095-023-0356-2

360

Views

8

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 01 February 2023
Accepted: 08 May 2023
Published: 03 January 2024
© The Author(s) 2023.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return