Architectural facade design with style and structural features using stable diffusion model

Minghao Wen; Dong Liang; Haibo Ye; Huawei Tu

doi:10.26599/JIC.2024.9180034

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (15.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access | Online First

Architectural facade design with style and structural features using stable diffusion model

Minghao Wen^{^a}, Dong Liang^{^a}(

), Haibo Ye^{^a}, Huawei Tu^{^b}

aCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

bDepartment of Computer Science and Information Technology, La Trobe University, Melbourne 3086, Australia

Show Author Information

Abstract

With advancements in digital technology, the field of architectural design has increasingly embraced data and algorithms to enhance design efficiency and quality. Recent advancements in text-to-image (T2I) generation models have enabled the creation of images that correspond to textual descriptions. However, textual descriptions struggle to capture essential style characteristics in style images. In this study, we proposed a method for architectural facade design based on the stable diffusion model (SDM) that combined stylistic images or keywords as input with the structural conditions of content images to generate images with both stylistic and architectural features. By employing the constrastive language-image pre-training (CLIP) image encoder to convert the style image into its initial image embedding and feature extraction from multilayer cross-attention and training optimization to obtain a pretrained image embedding, the proposed method extracts stylistic features from style images and converts them into corresponding embeddings. This process enables the generated images to embody stylistic features and artistic semantic information. Furthermore, the T2I adapter model is employed to use the architectural structure of content images as conditional guidance, thereby ensuring that the generated images exhibit the corresponding structural features. By leveraging these two aspects, the proposed method can decorate architecture with stylistic features from stylistic images while preserving the architectural structure features of content images, resulting in images that reflect the content images after style transformation. Our method is mainly used in architectural design applications. It was capable of generating facade images from flat design drawings, three-dimensional (3D) architectural models, and hand-drawn sketches and has achieved commendable results.

Keywords

architectural design style transfer digital facade generation stable diffusion T2I-adapter

References

[1]

G. E. Wiggins. Methodology in architectural design. Ph.D. Thesis, Cambridge, USA: Massachusetts Institute of Technology, 1989.

[2]

D. W. Bao, X. Yan. Data-driven performance-based generative design and digital fabrication for industry 4.0: Precedent work, current progress, and future prospects. In: Architecture and Design for Industry 4.0: Theory and Practice. M. Barberio, M. Colella, A. Figliola, et al., Eds. Cham, Switzerland: Springer, 2023: pp 943–961.

Crossref

[3]

Y. Yang, J. Q. Yang, R. Bao, et al. Corporate relative valuation using heterogeneous multi-modal graph neural network. IEEE Trans Knowl Data Eng, 2021, 35: 211–224.

Crossref Google Scholar

[4]

B. Bölek, O. Tutal, H. Özbaşaran. A systematic review on artificial intelligence applications in architecture. J Des Resilience Archit Plann, 2023, 4: 91–104.

Crossref Google Scholar

[5]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144.

Crossref Google Scholar

[6]

A. K. Ali, O. J. Lee. Facade style mixing using artificial intelligence for urban infill. Architecture, 2023, 3: 258–269.

Crossref Google Scholar

[7]

L. Zhang, L. Zheng, Y. L. Chen, et al. CGAN-assisted renovation of the styles and features of street facades—A case study of the Wuyi area in Fujian, China. Sustainability, 2022, 14: 16575.

Crossref Google Scholar

[8]

P. Isola, J. Y. Zhu, T. H. Zhou, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: pp 5967–5976.

Crossref

[9]

Q. Yu, J. Malaeb, W. J. Ma. Architectural facade recognition and generation through generative adversarial networks. In: Proceedings of International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 2020: pp 310–316.

Crossref

[10]

L. Xu, G. Y. Lin, A. Giordano. Preliminary study on architectural skin design method driven by neural style transfer. In: Beyond Digital Representation: Advanced Experiences in AR and AI for Cultural Heritage and Innovative Design. A. Giordano, M. Russo, R. Spallone, Eds. Cham, Switzerland: Springer, 2023: pp 739–750.

Crossref

[11]

L. M. Zhang, A. Y. Rao, M. Agrawala. Adding conditional control to text-to-image diffusion models. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: pp 3813–3824.

Crossref

[12]

Y. Yang, D. C. Zhan, X. R. Sheng, et al. Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of the 27^th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: pp 2998–3004.

Crossref

[13]

Y. Guo, C. Yang, A. Rao, et al. AnimateDiff: Animate your personalized text-to-image diffusion models without specific tuning. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.

[14]

R. Rombach, A. Blattmann, D. Lorenz, et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: pp 10674–10685.

Crossref

[15]

D. Podell, Z. English, K. Lacey, et al. SDXL: Improving latent diffusion models for high-resolution image synthesis. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.

[16]

J. Betker, G. Goh, L. Jing, et al. Improving image generation with better captions. Comput Sci, 2023, 2: 8.

Crossref Google Scholar

[17]

D. P. Kingma, M. Welling. Auto-encoding variational Bayes. In: Proceedings of the 2^nd International Conference on Learning Representations, Banff, Canada, 2014.

[18]

A. Vaswani, N. Shazeer, N. Parmar, et al. Attention is all you need. In: Proceedings of the 31^st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: pp 6000–6010.

[19]

E. J. Hu, Y. L. Shen, P. Wallis, et al. LoRA: Low-rank adaptation of large language models. In: Proceedings of the 10^th International Conference on Learning Representations, Online, 2022.

[20]

H. R. Ma, H. Zheng. Text Semantics to Image Generation: A method of building facades design base on stable diffusion model. In: Proceedings of the 5^th International Conference on Computational Design and Robotic Fabrication, Shanghai, China, 2023: pp 24–34.

Crossref

[21]

L. Li, D. Liang, Y. H. Gao, et al. ALL-E: Aesthetics-guided low-light image enhancement. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 2023: pp 1062–1070.

Crossref

[22]

A. Radford, J. W. Kim, C. Hallacy, et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38^th International Conference on Machine Learning, Online, 2021: pp 8748–8763.

[23]

C. Mou, X. T. Wang, L. B. Xie, et al. T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the 38^th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: pp 4296–4304.

Crossref

[24]

R. Gal, Y. Alaluf, Y. Atzmon, et al. An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 2023.

[25]

O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18^th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: pp 234–241.

Crossref

[26]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In: Proceedings of the 9^th International Conference on Learning Representations, Vienna, Austria, 2021.

[27]

J. M. Song, C. L. Meng, S. Ermon. Denoising diffusion implicit models. In: Proceedings of the 9^th International Conference on Learning Representations, Vienna, Austria, 2021.

[28]

Y. X. Zhang, N. S. Huang, F. Tang, et al. Inversion-based style transfer with diffusion models. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: pp 10146–10156.

Crossref

[29]

W. Z. Shi, J. Caballero, F. Huszár, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: pp 1874–1883.

Crossref

[30]

C. Lu, Y. H. Zhou, F. Bao, et al. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: Proceedings of the 36^th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022.

Journal of Intelligent Construction

DOI: 10.26599/JIC.2024.9180034

Cite this article:

Wen M, Liang D, Ye H, et al. Architectural facade design with style and structural features using stable diffusion model. Journal of Intelligent Construction, 2024, https://doi.org/10.26599/JIC.2024.9180034

515

Views

Downloads

Crossref

Google Scholar
Citation

Altmetrics

Received: 19 March 2024

Revised: 03 May 2024

Accepted: 09 May 2024

Published: 09 August 2024

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.