PDF (1.1 MB)
Collect
Submit Manuscript
Perspective

Large language models for building energy applications: Opportunities and challenges

Mingzhe Liu1Liang Zhang2Jianli Chen3Wei-An Chen4Zhiyao Yang1L. James Lo5Jin Wen5Zheng O’Neill1()
J Mike Walker’ 66 Department of Mechanical Engineering, Texas A&M University, College Station, TX 77843, USA
Department of Civil and Architectural Engineering and Mechanics, The University of Arizona, Tucson, AZ 85719, USA
College of Civil Engineering, Tongji University, Shanghai 200092, China
Department of Multidisciplinary Engineering, Texas A&M University, College Station, TX 77843, USA
Civil, Architectural and Environmental Engineering, Drexel University, Philadelphia, PA 19104, USA
Show Author Information

Abstract

Large language models (LLMs) are gaining attention due to their potential to enhance efficiency and sustainability in the building domain, a critical area for reducing global carbon emissions. Built on transformer architectures, LLMs excel at text generation and data analysis, enabling applications such as automated energy model generation, energy management optimization, and fault detection and diagnosis. These models can potentially streamline complex workflows, enhance decision-making, and improve energy efficiency. However, integrating LLMs into building energy systems poses challenges, including high computational demands, data preparation costs, and the need for domain-specific customization. This perspective paper explores the role of LLMs in the building energy system sector, highlighting their potential applications and limitations. We propose a development roadmap built on in-context learning, domain-specific fine-tuning, retrieval augmented generation, and multimodal integration to enhance LLMs’ customization and practical use in this field. This paper aims to spark ideas for bridging the gap between LLMs capabilities and practical building applications, offering insights into the future of LLM-driven methods in building energy applications.

References

 
Achiam J, Adler S, Agarwal S, et al. (2023). Gpt-4 technical report. arXiv:2303.08774.
 
Ahn J, Verma R, Lou R, et al. (2024). Large language models for mathematical reasoning: Progresses and challenges. arXiv:2402.00157.
 
Alayrac J-B, Donahue J, Luc P, et al. (2022). Flamingo: A visual language model for few-shot learning. In: Proceedings of the 36th International Conference on Neural Information Processing System (NIPS’22), New Orleans, LA, USA.
 
Amirizaniani M, Yao J, Lavergne A, et al. (2024). Developing a framework for auditing large language models using human-in-the-loop. arXiv:2402.09346v1.
 
Balaji B, Bhattacharya A, Fierro G, et al. (2016). Brick: Towards a unified metadata schema for buildings. In: Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments, Palo Alto, CA, USA.
 
Benes K, Porterfield J, Yang C (2024). AI for energy: Opportunities for a modern grid and clean energy economy. U.S. Department of Energy.
 
Brown TB (2020). Language models are few-shot learners. arXiv:2005.14165.
 
Credit K, Xiao Q, Lehane J, et al. (2024). LuminLab: An AI-powered building retrofit and energy modelling platform. arXiv:2404.16057.
 
Dave AJ, Nguyen TN, Vilim RB (2024). Integrating LLMs for explainable fault diagnosis in complex systems. arXiv:240206695.
 

Decardi-Nelson B, Alshehri AS, Ajagekar A, et al. (2024). Generative AI and process systems engineering: The next frontier. Computers & Chemical Engineering, 187: 108723.

 
Devlin J (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
 
Driess D, Xia F, Sajjadi MS, et al. (2023). PaLM-E: An embodied multimodal language model. arXiv:230303378.
 
Ge Y, Hua W, Mei K, et al. (2023). OpenAGI: When LLM Meets Domain Experts. In: Oh A, Naumann T, Globerson A, et al. (Eds), Advances in Neural Information Processing Systems. Curran Associates. pp. 5539–5568.
 
Gupta P, Sehgal NK, Acken JM (2024). Machine learning operations. In: Introduction to Machine Learning with Security: Theory and Practice Using Python in the Cloud. Cham, Switzerland: Springer.
 
He K, Fan H, Wu Y, et al. (2020). Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.
 
Hu EJ, Shen Y, Wallis P, et al. (2021). LoRA: Low-rank adaptation of large language models. arXiv:2106.09685.
 

Jiang G, Ma Z, Zhang L, et al. (2024). EPlus-LLM: A large language model-based computing platform for automated building energy modeling. Applied Energy, 367: 123431.

 

Jose S, Nguyen KTP, Medjaher K, et al. (2024). Advancing multimodal diagnostics: Integrating industrial textual data and domain knowledge with large language models. Expert Systems with Applications, 255: 124603.

 
Lange R, Tian Y, Tang Y (2024). Large language models as evolution strategies. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, Melbourne, Australia.
 
Ling C, Zhao X, Lu J, et al. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv:2305.18703.
 
Liu Y, Deng G, Li Y, et al. (2023). Prompt injection attack against LLM-integrated applications. arXiv:230605499.
 
Liu P, Guo H, Dai T, et al. (2024). CALF: Aligning LLMs for time series forecasting via cross-modal fine-tuning. ArXiv: 240307300.
 

Livne M, Miftahutdinov Z, Tutubalina E, et al. (2024). nach0: Multimodal natural and chemical languages foundation model. Chemical Science, 15: 8380–8389.

 

Lu J, Tian X, Zhang C, et al. (2024). Evaluation of large language models (LLMs) on the mastery of knowledge and skills in the heating, ventilation and air conditioning (HVAC) industry. Energy and Built Environment, https://doi.org/10.1016/j.enbenv.2024.03.010

 
Luccioni S, Jernite Y, Strubell E, et al. (2024). Power hungry processing: Watts driving the cost of AI deployment? In: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro, Brazil.
 
Madaan A, Tandon N, Gupta P, et al. (2023). Self-refine: Iterative refinement with self-feedback. In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS’23).
 
Marvin G, Hellen N, Jjingo D, et al. (2023). Prompt engineering in large language models. In: Proceedings of International Conference on Data Intelligence and Cognitive Informatics.
 

O’Brien W, Wagner A, Schweiker M, et al. (2020). Introducing IEA EBC annex 79: Key challenges and opportunities in the field of occupant-centric building design and operation. Building and Environment, 178: 106738.

 
OpenAI (2024). Fine-tuning now available for GPT-4o. Available at https://openai.com/index/gpt-4o-fine-tuning/. Accessed 25 Sep 2024.
 

Pan S, Luo L, Wang Y, et al. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36: 3580–3599.

 
Patil SG, Zhang T, Wang X, et al. (2023). Gorilla: Large language model connected with massive APIs. arXiv:2305.15334.
 

Pradhan O, Wen J, Hälleberg D, et al. (2024). Evaluation of data imputation approaches for multi-stream building systems data 1. Science and Technology for the Built Environment, 30: 1035–1048.

 

Pu H, Yang X, Li J, et al. (2024). AutoRepo: A general framework for multimodal LLM-based automated construction reporting. Expert Systems with Applications, 255: 124601.

 
Qian C, Cong X, Yang C, et al. (2023). Communicative agents for software development. arXiv:2307.07924 6.
 
Radford A, Kim JW, Hallacy C, et al. (2021). Learning transferable visual models from natural language supervision. In: Proceedings of International Conference on Machine Learning (PMLR).
 
Ramesh A, Pavlov M, Goh G, et al. (2021). Zero-shot text-to-image generation. In: Proceedings of International Conference on Machine Learning (PMLR).
 
Ross SI, Martinez F, Houde S, et al. (2023). The programmer’s assistant: Conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, Sydney, Australia.
 
Shuster K, Poff S, Chen M, et al. (2021). Retrieval augmentation reduces hallucination in conversation. arXiv:2104.07567.
 
Su J, Jiang C, Jin X, et al. (2024). Large language models for forecasting and anomaly detection: A systematic literature review. arXiv:240210350.
 

Sun Y, Haghighat F, Fung BCM (2022). Trade-off between accuracy and fairness of data-driven building and indoor environment models: A comparative study of pre-processing methods. Energy, 239: 122273.

 
Vaswani A (2017). Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17).
 
Vatsal S, Dubey H (2024). A survey of prompt engineering methods in large language models for different NLP tasks. arXiv:240712994.
 

Wang L, Ma C, Feng X, et al. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18: 186345.

 
Wei J, Wang X, Schuurmans D, et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA.
 

Wu T, Ling Q (2024). STELLM: Spatio-temporal enhanced pre-trained large language model for wind speed forecasting. Applied Energy, 375: 124034.

 
Yao J-Y, Ning K-P, Liu Z-H, et al. (2023). Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv:231001469.
 

Yao Y, Duan J, Xu K, et al. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4: 100211.

 
Zhang L, Chen Z (2023). Opportunities and challenges of applying large language models in building energy efficiency and decarbonization studies: An exploratory overview. arXiv:2312.11701.
 

Zhang L, Chen Z (2024). Large language model-based interpretable machine learning control in building energy systems. Energy and Buildings, 313: 114278.

 

Zhang C, Lu J, Zhao Y (2024a). Generative pre-trained transformers (GPT)-based automated data mining for building energy management: Advantages, limitations and the future. Energy and Built Environment, 5: 143–169.

 

Zhang C, Zhang J, Zhao Y, et al. (2024b). Automated data mining framework for building energy conservation aided by generative pre-trained transformers (GPT). Energy and Buildings, 305: 113877.

 

Zhang L, Chen Z, Ford V (2024c). Advancing building energy modeling with large language models: Exploration and case studies. Energy and Buildings, 323: 114788.

 

Zhang J, Zhang C, Lu J, et al. (2025a). Domain-specific large language models for fault diagnosis of heating, ventilation, and air conditioning systems by labeled-data-supervised fine-tuning. Applied Energy, 377: 124378.

 

Zhang L, Ford V, Chen Z, et al. (2025b). Automatic building energy model development and debugging using large language models agentic workflow. Energy and Buildings, 327: 115116.

 

Zhang Y, Wang D, Wang G, et al. (2025c). Data-driven building load prediction and large language models: Comprehensive overview. Energy and Buildings, 326: 115001.

 

Zhao H, Chen H, Yang F, et al. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology, 15: 1–38.

Building Simulation
Pages 225-234
Cite this article:
Liu M, Zhang L, Chen J, et al. Large language models for building energy applications: Opportunities and challenges. Building Simulation, 2025, 18(2): 225-234. https://doi.org/10.1007/s12273-025-1235-9
Metrics & Citations  
Article History
Copyright
Return