Architectural facade design with style and structural features using stable diffusion model

Minghao Wen; Dong Liang; Haibo Ye; Huawei Tu

doi:10.26599/JIC.2024.9180034

| Sign up

PDF (15.6 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Research Article | Open Access

Architectural facade design with style and structural features using stable diffusion model

Minghao Wen^{^a}, Dong Liang^{^a}(), Haibo Ye^{^a}, Huawei Tu^{^b}

aCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

bDepartment of Computer Science and Information Technology, La Trobe University, Melbourne 3086, Australia

Show Author Information

Abstract

With advancements in digital technology, the field of architectural design has increasingly embraced data and algorithms to enhance design efficiency and quality. Recent advancements in text-to-image (T2I) generation models have enabled the creation of images that correspond to textual descriptions. However, textual descriptions struggle to capture essential style characteristics in style images. In this study, we proposed a method for architectural facade design based on the stable diffusion model (SDM) that combined stylistic images or keywords as input with the structural conditions of content images to generate images with both stylistic and architectural features. By employing the constrastive language-image pre-training (CLIP) image encoder to convert the style image into its initial image embedding and feature extraction from multilayer cross-attention and training optimization to obtain a pretrained image embedding, the proposed method extracts stylistic features from style images and converts them into corresponding embeddings. This process enables the generated images to embody stylistic features and artistic semantic information. Furthermore, the T2I adapter model is employed to use the architectural structure of content images as conditional guidance, thereby ensuring that the generated images exhibit the corresponding structural features. By leveraging these two aspects, the proposed method can decorate architecture with stylistic features from stylistic images while preserving the architectural structure features of content images, resulting in images that reflect the content images after style transformation. Our method is mainly used in architectural design applications. It was capable of generating facade images from flat design drawings, three-dimensional (3D) architectural models, and hand-drawn sketches and has achieved commendable results.

Keywords

architectural design digital facade generation style transfer stable diffusion T2I-adapter

References

[1]

G. E. Wiggins. Methodology in architectural design. Ph.D. Thesis, Cambridge, USA: Massachusetts Institute of Technology, 1989.

[2]

D. W. Bao, X. Yan. Data-driven performance-based generative design and digital fabrication for industry 4.0: Precedent work, current progress, and future prospects. In: Architecture and Design for Industry 4.0: Theory and Practice. M. Barberio, M. Colella, A. Figliola, et al., Eds. Cham, Switzerland: Springer, 2023: pp 943–961.

Crossref

[3]

Y. Yang, J. Q. Yang, R. Bao, et al. Corporate relative valuation using heterogeneous multi-modal graph neural network. IEEE Trans Knowl Data Eng, 2021, 35: 211–224.

Crossref Google Scholar

[4]

B. Bölek, O. Tutal, H. Özbaşaran. A systematic review on artificial intelligence applications in architecture. J Des Resilience Archit Plann, 2023, 4: 91–104.

Crossref Google Scholar

[5]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al. Generative adversarial networks. Commun ACM, 2020, 63: 139–144.

Crossref Google Scholar

[6]

A. K. Ali, O. J. Lee. Facade style mixing using artificial intelligence for urban infill. Architecture, 2023, 3: 258–269.

Crossref Google Scholar

[7]

L. Zhang, L. Zheng, Y. L. Chen, et al. CGAN-assisted renovation of the styles and features of street facades—A case study of the Wuyi area in Fujian, China. Sustainability, 2022, 14: 16575.

Crossref Google Scholar

[8]

P. Isola, J. Y. Zhu, T. H. Zhou, et al. Image-to-image translation with conditional adversarial networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: pp 5967–5976.

Crossref

[9]

Q. Yu, J. Malaeb, W. J. Ma. Architectural facade recognition and generation through generative adversarial networks. In: Proceedings of International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 2020: pp 310–316.

Crossref

[10]

L. Xu, G. Y. Lin, A. Giordano. Preliminary study on architectural skin design method driven by neural style transfer. In: Beyond Digital Representation: Advanced Experiences in AR and AI for Cultural Heritage and Innovative Design. A. Giordano, M. Russo, R. Spallone, Eds. Cham, Switzerland: Springer, 2023: pp 739–750.

Crossref

[11]

L. M. Zhang, A. Y. Rao, M. Agrawala. Adding conditional control to text-to-image diffusion models. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: pp 3813–3824.

Crossref

[12]

Y. Yang, D. C. Zhan, X. R. Sheng, et al. Semi-supervised multi-modal learning with incomplete modalities. In: Proceedings of the 27^th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: pp 2998–3004.

Crossref

[13]

Y. Guo, C. Yang, A. Rao, et al. AnimateDiff: Animate your personalized text-to-image diffusion models without specific tuning. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.

[14]

R. Rombach, A. Blattmann, D. Lorenz, et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: pp 10674–10685.

Crossref

[15]

D. Podell, Z. English, K. Lacey, et al. SDXL: Improving latent diffusion models for high-resolution image synthesis. In: Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 2024.

[16]

J. Betker, G. Goh, L. Jing, et al. Improving image generation with better captions. Comput Sci, 2023, 2: 8.

Google Scholar

[17]

D. P. Kingma, M. Welling. Auto-encoding variational Bayes. In: Proceedings of the 2^nd International Conference on Learning Representations, Banff, Canada, 2014.

[18]

A. Vaswani, N. Shazeer, N. Parmar, et al. Attention is all you need. In: Proceedings of the 31^st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: pp 6000–6010.

[19]

E. J. Hu, Y. L. Shen, P. Wallis, et al. LoRA: Low-rank adaptation of large language models. In: Proceedings of the 10^th International Conference on Learning Representations, Online, 2022.

[20]

H. R. Ma, H. Zheng. Text Semantics to Image Generation: A method of building facades design base on stable diffusion model. In: Proceedings of the 5^th International Conference on Computational Design and Robotic Fabrication, Shanghai, China, 2023: pp 24–34.

Crossref

[21]

L. Li, D. Liang, Y. H. Gao, et al. ALL-E: Aesthetics-guided low-light image enhancement. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 2023: pp 1062–1070.

Crossref

[22]

A. Radford, J. W. Kim, C. Hallacy, et al. Learning transferable visual models from natural language supervision. In: Proceedings of the 38^th International Conference on Machine Learning, Online, 2021: pp 8748–8763.

[23]

C. Mou, X. T. Wang, L. B. Xie, et al. T2I-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the 38^th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: pp 4296–4304.

Crossref

[24]

R. Gal, Y. Alaluf, Y. Atzmon, et al. An image is worth one word: Personalizing text-to-image generation using textual inversion. In: Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 2023.

[25]

O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation. In: Proceedings of the 18^th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: pp 234–241.

Crossref

[26]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In: Proceedings of the 9^th International Conference on Learning Representations, Vienna, Austria, 2021.

[27]

J. M. Song, C. L. Meng, S. Ermon. Denoising diffusion implicit models. In: Proceedings of the 9^th International Conference on Learning Representations, Vienna, Austria, 2021.

[28]

Y. X. Zhang, N. S. Huang, F. Tang, et al. Inversion-based style transfer with diffusion models. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: pp 10146–10156.

Crossref

[29]

W. Z. Shi, J. Caballero, F. Huszár, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: pp 1874–1883.

Crossref

[30]

C. Lu, Y. H. Zhou, F. Bao, et al. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: Proceedings of the 36^th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022.

Journal of Intelligent Construction

Volume 2 Issue 4,
December 2024

Article number: 9180034

DOI: 10.26599/JIC.2024.9180034

Cite this article:

Wen M, Liang D, Ye H, et al. Architectural facade design with style and structural features using stable diffusion model. Journal of Intelligent Construction, 2024, 2(4): 9180034. https://doi.org/10.26599/JIC.2024.9180034