Federated Transfer Learning for On-Device LLMs Efficient Fine Tuning Optimization

Chuantao Li; Bruce Gu; Zhigang Zhao; Youyang Qu; Guomao Xin; Jidong Huo; Longxiang Gao

doi:10.26599/BDMA.2024.9020068

| Sign up

PDF (3.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Federated Transfer Learning for On-Device LLMs Efficient Fine Tuning Optimization

Chuantao Li^{¹^,^C}, Bruce Gu^{¹^,^C}, Zhigang Zhao^²(), Youyang Qu^¹, Guomao Xin^³, Jidong Huo^⁴(), Longxiang Gao^¹

1Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250000, China, and also with Shandong Provincial Key Laboratory of Computer Networks, Jinan 250000, China

2Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250000, China, and also with Department of Computer Science, Fudan University, Shanghai 200000, China

3TelChina Group Co. Ltd., Jinan 250000, China

4Key Laboratory of Computing Power Network and Information Security, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250000, China, and also with College Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266000, China

Show Author Information

Abstract

The proliferation of Large Language Models (LLMs) has catalyzed the growth of various industries. It is therefore imperative to ensure the controlled and beneficial application of LLMs across specific domains for downstream tasks through transfer learning, while preserving their general capabilities. We propose a novel and on-device efficient fine-tuning optimization algorithm for LLMs, utilizing federated transfer learning. Specifically, we introduce the Fusion of low Rank Adaptation (FoRA) optimization algorithm from a micro perspective, which enhances multi-dimensional feature aggregation through the addition of efficient parameters. From a meso perspective, we extend the application of the FoRA algorithm across all linear layers within the Transformer architecture to facilitate downstream task performance. Finally, from a macro perspective and with a focus on the medical domain, we incorporate quantization techniques into the federated learning framework to achieve on-device efficient fine-tuning optimization, thereby offering dual protection for data and model integrity. Our results indicate that, compared to existing state-of-the-art methods, our algorithm significantly improves LLM performance while ensuring dual privacy protection of both data and models.

Keywords

Federated learning fine-tuning deep learning Large Language Models (LLMs)

References

[1]

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., Training language models to follow instructions with human feedback, in Proc. 36^th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 27730–27744.

[2]

C. Li, M. Zhang, Q. Mei, Y. Wang, S. A. Hombaiah, Y. Liang, and M. Bendersky, Teach LLMs to personalize—an approach inspired by writing education, arXiv preprint arXiv:2308.07968, 2023.

[3]

Q. Zheng, X. Xia, X. Zou, Y. Dong, S. Wang, Y. Xue, L. Shen, Z. Wang, A. Wang, Y. Li, et al., CodeGeeX: A pre-trained model for code generation with multilingual benchmarking on HumanEval-X, in Proc. 29^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Long Beach, CA, USA, 2023, pp. 5673–5684.

[4]

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, et al., Training compute-optimal large language models, in Proc. 36^th Int. Conf. Neural Information Processing System, New Orleans, LA, USA, 2022, pp. 30016–30030.

[5]

P. Villalobos, J. Sevilla, L. Heim, T. Besiroglu, M. Hobbhahn, and A. Ho, Will we run out of data? An analysis of the limits of scaling datasets in machine learning, arXiv preprint arXiv:2211.04325, 2022.

[6]

P. Voigt and A. von dem Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide. Cham, Switzerland: Springer, 2017.

[7]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20^th Int. Conf. Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 2017, pp. 1273–1282.

[8]

Y. Liu, Y. Qu, C. Xu, Z. Hao, and B. Gu, Blockchainenabled asynchronous federated learning in edge computing, Sensors, vol. 21, no. 10, p. 3335, 2021.

Crossref Google Scholar

[9]

A. T. Nguyen, P. H. S. Torr, and S. N. Lim, FedSR: A simple and effective domain generalization method for federated learning, in Proc. 36^th Int. Conf. Neural Information Processing System, New Orleans, LA, USA, 2022, pp. 38831–38843.

[10]

K. I. K. Wang, X. Zhou, W. Liang, Z. Yan, and J. She, Federated transfer learning based cross-domain prediction for smart manufacturing, IEEE Trans. Ind. Inf., vol. 18, no. 6, pp. 4088–4096, 2022.

Crossref Google Scholar

[11]

X. D. Wang, J. Hu, H. Lin, W. X. Liu, H. Moon, and M. J. Piran, Federated learning-empowered disease diagnosis mechanism in the internet of medical things: From the privacy-preservation perspective, IEEE Trans. Ind. Inf., vol. 19, no. 7, pp. 7905–7913, 2023.

Crossref Google Scholar

[12]

K. Sultana, K. Ahmed, B. Gu, and H. Wang, Elastic optimization for stragglers in edge federated learning, Big Data Mining and Analytics, vol. 6, no. 4, pp. 404–420, 2023.

Crossref Google Scholar

[13]

T. Fan, Y. Kang, G. Ma, W. Chen, W. Wei, L. Fan, and Q. Yang, FATE-LLM: A industrial grade federated learning framework for large language models, arXiv preprint arXiv:2310.10049, 2023.

[14]

G. Xiao, J. Lin, and S. Han, Offsite-tuning: Transfer learning without full model, arXiv preprint arXiv:2302.04870, 2023.

[15]

W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li, B. Ding, and J. Zhou, FederatedScope-LLM: A comprehensive package for fine-tuning large language models in federated learning, in Proc. 30^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Barcelona, Spain, 2024, pp. 5260–5271.

[16]

C. Chen, X. Feng, J. Zhou, J. Yin, and X. Zheng, Federated large language model: A position paper, arXiv preprint arXiv:2307.08925, 2023.

[17]

J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, and I. Gurevych, AdapterFusion: Non-destructive task composition for transfer learning, in Proc. 16^th Conf. European Chapter of the Association for Computational Linguistics : Main Volume, Kiev, Ukraine, 2021, pp. 487–503.

[18]

E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, LoRA: Low-rank adaptation of large language models, arXiv preprint arXiv:2106.09685, 2022.

[19]

X. Sun, Y. Ji, B. Ma, and X. Li, A comparative study between full-parameter and LoRA-based fine-tuning on Chinese instruction data for instruction following large language model, arXiv preprint arXiv:2304.08109, 2023.

[20]

T. Dettmers, M. Lewis, Y. Belkada, and L. Zettlemoyer, LLM.int8(): 8-bit matrix multiplication for transformers at scale, in Proc. 36^th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 30318–30332.

[21]

Y. Gu, L. Dong, F. Wei, and M. Huang, MiniLLM: Knowledge distillation of large language models, arXiv preprint arXiv:2306.08543, 2023.

[22]

M. Xia, T. Gao, Z. Zeng, and D. Chen, Sheared LLaMA: Accelerating language model pre-training via structured pruning, arXiv preprint arXiv:2310.06694, 2024.

[23]

A. Yang, B. Xiao, B. Wang, B. Zhang, C. Bian, C. Yin, C. Lv, D. Pan, D. Wang, D. Yan, et al., Baichuan 2: Open large-scale language models, arXiv preprint arXiv:2309.10305, 2023.

[24]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.

[25]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pre-training, https://api.semanticscholar.org/CorpusID:49313245, 2023.

[26]

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in Proc. 34^th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 1877–1901,.

[27]

Z. Du, Y. Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang, GLM: General language model pretraining with autoregressive blank infilling, in Proc. 60^th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long Papers), Dublin, Ireland, 2022, pp. 320–335.

[28]

J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al., Emergent abilities of large language models, arXiv preprint arXiv:2206.07682, 2022.

[29]

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al., LLaMA: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971, 2023.

[30]

Z. Bao, W. Chen, S. Xiao, K. Ren, J. Wu, C. Zhong, J. Peng, X. Huang, and Z. Wei, DISC-MedLLM: Bridging general large language models and real-world medical consultation, arXiv preprint arXiv:2308.14346, 2023.

[31]

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, Parameter-efficient transfer learning for NLP, in Proc. 36^th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 2790–2799.

[32]

X. L. Li and P. Liang, Prefix-Tuning: Optimizing continuous prompts for generation, in Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int. Joint Conf. Natural Language Processing (Volume 1 : Long Papers), Bangkok, Thailand, 2021, pp. 4582.

[33]

J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, Towards a unified view of parameter-efficient transfer learning, arXiv preprint arXiv:2110.04366, 2022.

[34]

A. Renduchintala, T. Konuk, and O. Kuchaiev, Tied-loRA: Enhancing parameter efficiency of LoRA with weight tying, in Proc. 2024 Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies (Volume 1 : Long Papers), Mexico City, Mexico, 2024, pp. 8694–8705.

[35]

N. Ding, X. Lv, Q. Wang, Y. Chen, B. Zhou, Z. Liu, and M. Sun, Sparse low-rank adaptation of pre-trained language models, in Proc. 2023 Conf. Empirical Methods in Natural Language Processing, Singapore, 2023, pp. 4133–4145.

[36]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000−6010.

[37]

Q. Zhang, M. Chen, A. Bukharin, P. He, Y. Cheng, W. Chen, and T. Zhao, Adaptive budget allocation for parameter-efficient fine-tuning, arXiv preprint arXiv:2303.10512, 2023.

[38]

D. J. Kopiczko, T. Blankevoort, and Y. M. Asano, VeRA: Vector-based random matrix adaptation, arXiv preprint arXiv:2310.11454, 2023.

[39]

T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang, Pruning and quantization for deep neural network acceleration: A survey, Neurocomputing, vol. 461, pp. 370–403, 2021.

Crossref Google Scholar

[40]

T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, QLORA: Efficient finetuning of quantized LLMs, in Proc. 37^th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2023, pp. 10088–10115.

[41]

P. Kaywan, K. Ahmed, A. Ibaida, Y. Miao, and B. Gu, Early detection of depression using a conversational AI bot: A non-clinical trial, PLoS One, vol. 18, no. 2, p. e0279743, 2023.

Crossref Google Scholar

[42]

L. Yunxiang, L. Zihan, Z. Kai, D. Ruilong, and Z. You, ChatDoctor: A medical chat model fine-tuned on a large language model meta-AI (LLaMA) using medical domain knowledge, arXiv preprint arXiv:2303.14070, 2023.

[43]

T. Han, L. C. Adams, J. M. Papaioannou, P. Grundmann, T. Oberhauser, A. Löser, D. Truhn, and K. K. Bressem, MedAlpaca—An open-source collection of medical conversational AI models and training data, arXiv preprint arXiv:2304.08247, 2023.

[44]

J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang, Federated learning for healthcare informatics, J. Healthc. Inform. Res., vol. 5, no. 1, pp. 1–19, 2021.

Crossref Google Scholar

[45]

C. Li, H. Farkhoor, R. Liu, and J. Yosinski, Measuring the intrinsic dimension of objective landscapes, arXiv preprint arXiv:1804.08838, 2018.

[46]

A. Aghajanyan, S. Gupta, and L. Zettlemoyer, Intrinsic dimensionality explains the effectiveness of language model fine-tuning, arXiv preprint arXiv:2012.13255, 2021.

[47]

P. He, J. Gao, and W. Chen, DeBERTaV3: Improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing, arXiv preprint arXiv:2111.09543, 2021.

[48]

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, GLUE: A multi-task benchmark and analysis platform for natural language understanding, in Proc. 2018 EMNLP Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 2018, pp. 353–355.

[49]

P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, SQuAD: 100000+ questions for machine comprehension of text, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 2383–2392.

[50]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., PyTorch: An imperative style, high-performance deep learning library, arXiv preprint arXiv:1912.01703, 2019.

[51]

E. Ben Zaken, Y. Goldberg, and S. Ravfogel, BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, in Proc. 60^th Annual Meeting of the Association for Computational Linguistics (Volume 2 : Short Papers), Dublin, Ireland, 2022, pp. 1–9.

[52]

I. Loshchilov and F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101, 2019.

[53]

N. Zhang, X. Xu, L. Tao, H. Yu, H. Ye, S. Qiao, X. Xie, X. Chen, Z. Li, and L. Li, DeepKE: A deep learning based knowledge extraction toolkit for knowledge base population, in Proc. 2022 Conf. Empirical Methods in Natural Language Processing : System Demonstrations, Abu Dhabi, The United Arab Emirates, 2022, pp. 98–108.

[54]

H. Gui, J. Zhang, H. Ye, and N. Zhang, InstructIE: A Chinese instruction-based information extraction dataset, arXiv preprint arXiv:2305.11527, 2023.

[55]

W. L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y. Zhuang, J. E. Gonzalez, et al., Vicuna: An open-source Chatbot impressing GPT-4 with 90%* ChatGPT quality, https://vicuna.lmsys.org, 2023.

[56]

Y. Dubois, X. Li, R. Taori, T. Zhang, I. Gulrajani, J. Ba, C. Guestrin, P. Liang, and T. B. Hashimoto, AlpacaFarm: A simulation framework for methods that learn from human feedback, in Proc. 37^th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2023, p. 1308.

[57]

L. Zheng, W. L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, et al., Judging LLM-as-a-judge with MT-bench and Chatbot arena, arXiv preprint arXiv:2306.05685, 2023.

Big Data Mining and Analytics

Volume 8 Issue 2,
April 2025

Pages 430-446

DOI: 10.26599/BDMA.2024.9020068

Cite this article:

Li C, Gu B, Zhao Z, et al. Federated Transfer Learning for On-Device LLMs Efficient Fine Tuning Optimization. Big Data Mining and Analytics, 2025, 8(2): 430-446. https://doi.org/10.26599/BDMA.2024.9020068