AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (15.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access | Online First

Architectural facade design with style and structural features using stable diffusion model

Minghao WenaDong Lianga( )Haibo YeaHuawei Tub
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Department of Computer Science and Information Technology, La Trobe University, Melbourne 3086, Australia
Show Author Information

Abstract

With advancements in digital technology, the field of architectural design has increasingly embraced data and algorithms to enhance design efficiency and quality. Recent advancements in text-to-image (T2I) generation models have enabled the creation of images that correspond to textual descriptions. However, textual descriptions struggle to capture essential style characteristics in style images. In this study, we proposed a method for architectural facade design based on the stable diffusion model (SDM) that combined stylistic images or keywords as input with the structural conditions of content images to generate images with both stylistic and architectural features. By employing the constrastive language-image pre-training (CLIP) image encoder to convert the style image into its initial image embedding and feature extraction from multilayer cross-attention and training optimization to obtain a pretrained image embedding, the proposed method extracts stylistic features from style images and converts them into corresponding embeddings. This process enables the generated images to embody stylistic features and artistic semantic information. Furthermore, the T2I adapter model is employed to use the architectural structure of content images as conditional guidance, thereby ensuring that the generated images exhibit the corresponding structural features. By leveraging these two aspects, the proposed method can decorate architecture with stylistic features from stylistic images while preserving the architectural structure features of content images, resulting in images that reflect the content images after style transformation. Our method is mainly used in architectural design applications. It was capable of generating facade images from flat design drawings, three-dimensional (3D) architectural models, and hand-drawn sketches and has achieved commendable results.

References

[1]
G. E. Wiggins. Methodology in architectural design. Ph.D. Thesis, Cambridge, USA: Massachusetts Institute of Technology, 1989.
[2]
D. W. Bao, X. Yan. Data-driven performance-based generative design and digital fabrication for industry 4.0: Precedent work, current progress, and future prospects. In: Architecture and Design for Industry 4.0: Theory and Practice. M. Barberio, M. Colella, A. Figliola, et al., Eds. Cham, Switzerland: Springer, 2023: pp 943–961.
[3]

Y. Yang, J. Q. Yang, R. Bao, et al. Corporate relative valuation using heterogeneous multi-modal graph neural network. IEEE Trans Knowl Data Eng, 2021, 35: 211–224.

[4]

B. Bölek, O. Tutal, H. Özbaşaran. A systematic review on artificial intelligence applications in architecture. J Des Resilience Archit Plann, 2023, 4: 91–104.

[5]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144.

[6]

A. K. Ali, O. J. Lee. Facade style mixing using artificial intelligence for urban infill. Architecture, 2023, 3: 258–269.

[7]

L. Zhang, L. Zheng, Y. L. Chen, et al. CGAN-assisted renovation of the styles and features of street facades—A case study of the Wuyi area in Fujian, China. Sustainability, 2022, 14: 16575.

[8]
P. Isola, J. Y. Zhu, T. H. Zhou, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: pp 5967–5976.
[9]
Q. Yu, J. Malaeb, W. J. Ma. Architectural facade recognition and generation through generative adversarial networks. In: Proceedings of International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 2020: pp 310–316.
[10]
L. Xu, G. Y. Lin, A. Giordano. Preliminary study on architectural skin design method driven by neural style transfer. In: Beyond Digital Representation: Advanced Experiences in AR and AI for Cultural Heritage and Innovative Design. A. Giordano, M. Russo, R. Spallone, Eds. Cham, Switzerland: Springer, 2023: pp 739–750.
[11]
L. M. Zhang, A. Y. Rao, M. Agrawala. Adding conditional control to text-to-image diffusion models. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: pp 3813–3824.
[12]
Y. Yang, D. C. Zhan, X. R. Sheng, et al. Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: pp 2998–3004.
[13]
Y. Guo, C. Yang, A. Rao, et al. AnimateDiff: Animate your personalized text-to-image diffusion models without specific tuning. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.
[14]
R. Rombach, A. Blattmann, D. Lorenz, et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: pp 10674–10685.
[15]
D. Podell, Z. English, K. Lacey, et al. SDXL: Improving latent diffusion models for high-resolution image synthesis. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.
[16]

J. Betker, G. Goh, L. Jing, et al. Improving image generation with better captions. Comput Sci, 2023, 2: 8.

[17]
D. P. Kingma, M. Welling. Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, 2014.
[18]
A. Vaswani, N. Shazeer, N. Parmar, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: pp 6000–6010.
[19]
E. J. Hu, Y. L. Shen, P. Wallis, et al. LoRA: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations, Online, 2022.
[20]
H. R. Ma, H. Zheng. Text Semantics to Image Generation: A method of building facades design base on stable diffusion model. In: Proceedings of the 5th International Conference on Computational Design and Robotic Fabrication, Shanghai, China, 2023: pp 24–34.
[21]
L. Li, D. Liang, Y. H. Gao, et al. ALL-E: Aesthetics-guided low-light image enhancement. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 2023: pp 1062–1070.
[22]
A. Radford, J. W. Kim, C. Hallacy, et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38th International Conference on Machine Learning, Online, 2021: pp 8748–8763.
[23]
C. Mou, X. T. Wang, L. B. Xie, et al. T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: pp 4296–4304.
[24]
R. Gal, Y. Alaluf, Y. Atzmon, et al. An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 2023.
[25]
O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: pp 234–241.
[26]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations, Vienna, Austria, 2021.
[27]
J. M. Song, C. L. Meng, S. Ermon. Denoising diffusion implicit models. In: Proceedings of the 9th International Conference on Learning Representations, Vienna, Austria, 2021.
[28]
Y. X. Zhang, N. S. Huang, F. Tang, et al. Inversion-based style transfer with diffusion models. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: pp 10146–10156.
[29]
W. Z. Shi, J. Caballero, F. Huszár, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: pp 1874–1883.
[30]
C. Lu, Y. H. Zhou, F. Bao, et al. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022.
Journal of Intelligent Construction
Cite this article:
Wen M, Liang D, Ye H, et al. Architectural facade design with style and structural features using stable diffusion model. Journal of Intelligent Construction, 2024, https://doi.org/10.26599/JIC.2024.9180034

515

Views

94

Downloads

0

Crossref

Altmetrics

Received: 19 March 2024
Revised: 03 May 2024
Accepted: 09 May 2024
Published: 09 August 2024
© The Author(s) 2024. Published by Tsinghua University Press.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Return