Ensemble Knowledge Distillation for Federated Semi-Supervised Image Classification

Ertong Shang; Hui Liu; Jingyang Zhang; Runqi Zhao; Junzhao Du

doi:10.26599/TST.2023.9010156

| Sign up

PDF (6.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Ensemble Knowledge Distillation for Federated Semi-Supervised Image Classification

Ertong Shang^{¹^,²}, Hui Liu^{¹^,²}(), Jingyang Zhang^¹, Runqi Zhao^¹, Junzhao Du^{¹^,²}()

1School of Computer Science and Technology, Xidian University, Xi’an 710126, China

2Engineering Research Center of Blockchain Technology Application and Evaluation, Ministry of Education, Xi’an 710126, China, and Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xi’an 710126, China

Show Author Information

Abstract

Federated learning is an emerging privacy-preserving distributed learning paradigm, in which many clients collaboratively train a shared global model under the orchestration of a remote server. Most current works on federated learning have focused on fully supervised learning settings, assuming that all the data are annotated with ground-truth labels. However, this work considers a more realistic and challenging setting, Federated Semi-Supervised Learning (FSSL), where clients have a large amount of unlabeled data and only the server hosts a small number of labeled samples. How to reasonably utilize the server-side labeled data and the client-side unlabeled data is the core challenge in this setting. In this paper, we propose a new FSSL algorithm for image classification based on consistency regularization and ensemble knowledge distillation, called EKDFSSL. Our algorithm uses the global model as the teacher in consistency regularization methods to enhance both the accuracy and stability of client-side unsupervised learning on unlabeled data. Besides, we introduce an additional ensemble knowledge distillation loss to mitigate model overfitting during server-side retraining on labeled data. Extensive experiments on several image classification datasets show that our EKDFSSL outperforms current baseline methods.

Keywords

federated learning semi-supervised learning federated semi-supervised learning knowledge distillation

References

[1]

J. Hojlo, Future of industry ecosystems: Shared data and insights, https://blogs.idc.com/2021/01/06/future-of-industryecosystems-shared-data-and-insights, 2021.

[2]

P. Regulation, General data protection regulation, https://eur-lex.europa.eu/legalcontent/EN/TXT/PDF/?uri=CELEX:32016R0679, 2016.

[3]

E. Illman and P. Temple, California consumer privacy act (CCPA), https://oag.ca.gov/privacy/ccpa, 2018.

[4]

F. A. KhoKhar, J. H. Shah, M. A. Khan, M. Sharif, U. Tariq, and S. Kadry, A review on federated learning towards image processing, Comput. Electr. Eng., vol. 99, p. 107818, 2022.

Crossref Google Scholar

[5]

R. S. Antunes, C. A. da Costa, A. Küderle, I. A. Yari, and B. Eskofier, Federated learning for healthcare: Systematic review and architecture proposal, ACM Trans. Intell. Syst. Technol., vol. 13, no. 4, pp. 1–23, 2022.

Crossref Google Scholar

[6]

P. Boobalan, S. P. Ramu, Q. V. Pham, K. Dev, S. Pandya, P. K. R. Maddikunta, T. R. Gadekallu, and T. Huynh-The, Fusion of Federated Learning and Industrial Internet of Things: A survey, Comput. Netw., vol. 212, p. 109048, 2022.

Crossref Google Scholar

[7]

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning, Found. Trends Mach. Learn., vol. 14, nos. 1&2, pp. 1–210, 2021.

Crossref Google Scholar

[8]

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, Federated learning: Challenges, methods, and future directions, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.

Crossref Google Scholar

[9]

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, Federated optimization in heterogeneous networks, in Proc. 3rd Machine Learning and Systems, Austin, TX, USA, 2020, pp. 429–450.

[10]

Q. Li, B. He, and D. Song, Model-contrastive federated learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 10713–10722.

Crossref

[11]

G. Lee, M. Jeong, Y. Shin, S. Bae, and S. Y. Yun, Preservation of the global knowledge by not true distillation in federated learning, in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, pp. 38461–38474.

[12]

J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, Tackling the objective inconsistency problem in heterogeneous federated optimization, in Proc. 34th Int. Conf. Neural Information Processing Systems, Virtual, 2020, pp. 7611–7623.

[13]

T. M. H. Hsu, H. Qi, and M. Brown, Measuring the effects of non-identical data distribution for federated visual classification, arXiv preprint arXiv: 1909.06335, 2019.

[14]

J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv: 1610.05492, 2016.

[15]

S. Caldas, J. Konecny, H. B. McMahan, and A. Talwalkar, Expanding the reach of federated learning by reducing client resource requirements, arXiv preprint arXiv: 1812.07210, 2018.

[16]

C. Xie, S. Koyejo, and I. Gupta, Asynchronous federated optimization, arXiv preprint arXiv: 1903.03934, 2019.

[17]

T. Nishio and R. Yonetani, Client selection for federated learning with heterogeneous resources in mobile edge, in Proc. IEEE Int. Conf. Communications (ICC), Shanghai, China, 2019, pp. 1–7.

Crossref

[18]

D. Yang, Z. Xu, W. Li, A. Myronenko, H. R. Roth, S. Harmon, S. Xu, B. Turkbey, E. Turkbey, X. Wang, et al., Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China, Italy, Japan, Med. Image Anal., vol. 70, p. 101992, 2021.

Crossref Google Scholar

[19]

Y. Zhu, Y. Liu, J. J. Q. Yu, and X. Yuan, Semi-supervised federated learning for travel mode identification from GPS trajectories, IEEE Trans. Intell. Transport. Syst., vol. 23, no. 3, pp. 2380–2391, 2022.

Crossref Google Scholar

[20]

Y. Zhao, H. Liu, H. Li, P. Barnaghi, and H. Haddadi, Semi-supervised federated learning for activity recognition, arXiv preprint arXiv: 2011.00851, 2020.

[21]

W. Jeong, J. Yoon, E. Yang, and S. J. Hwang, Federated semi-supervised learning with inter-client consistency & disjoint learning, arXiv preprint arXiv: 2006.12097v3, 2020.

[22]

Z. Zhang, Y. Yang, Z. Yao, Y. Yan, J. E. Gonzalez, K. Ramchandran, and M. W. Mahoney, Improving semi-supervised federated learning by reducing the gradient diversity of models, in Proc. IEEE Int. Conf. Big Data (Big Data), Orlando, FL, USA, 2021, pp. 1214–1225.

Crossref

[23]

Z. Zhang, S. Ma, J. Nie, Y. Wu, Q. Yan, X. Xu, and D. Niyato, Semi-supervised federated learning with non-IID data: Algorithm and system design, in Proc. IEEE 23rd Int. Conf. High Performance Computing & Communications; 7th Int. Conf. Data Science & Systems; 19th Int. Conf. Smart City; 7th Int. Conf. Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), Haikou, China, 2021, pp. 157–164.

Crossref

[24]

Z. Zhang, S. Ma, Z. Yang, Z. Xiong, J. Kang, Y. Wu, K. Zhang, and D. Niyato, Robust semi-supervised federated learning for images automatic recognition in Internet of drones, IEEE Internet Things J., vol. 10, no. 7, pp. 5733–5746, 2023.

Crossref Google Scholar

[25]

X. Liu, L. Zhu, S. T. Xia, Y. Jiang, and X. Yang, GDST: Global distillation self-training for semi-supervised federated learning, in Proc. IEEE Global Communications Conf. (GLOBECOM), Madrid, Spain, 2021, pp. 1–6.

Crossref

[26]

L. Che, Z. Long, J. Wang, Y. Wang, H. Xiao, and F. Ma, Fedtrinet: A pseudo labeling method with three players for federated semi-supervised learning, in Proc. 7th IEEE Int. Conf. Big Data, Huizhou, China, 2021, pp. 715–724.

Crossref

[27]

C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil, Model compression, in Proc. 12th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 2006, pp. 535–541.

Crossref

[28]

G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv: 1503.02531, 2015.

[29]

D. Yao, W. Pan, Y. Dai, Y. Wan, X. Ding, Z. Xu, and L. Sun, Local-global knowledge distillation in heterogeneous federated learning with non-IID data, arXiv preprint arXiv: 2107.00051, 2021.

[30]

L. Zhang, L. Shen, L. Ding, D. Tao, and L. Y. Duan, Fine-tuning global model via data-free knowledge distillation for non-IID federated learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10174–10183.

Crossref

[31]

P. Qi, X. Zhou, Y. Ding, Z. Zhang, S. Zheng, and Z. Li, FedBKD: Heterogenous federated learning via bidirectional knowledge distillation for modulation classification in IoT-edge system, IEEE J. Sel. Top. Signal Process., vol. 17, no. 1, pp. 189–204, 2023.

Crossref Google Scholar

[32]

E. Shang, H. Liu, Z. Yang, J. Du, and Y. Ge, FedBiKD: Federated bidirectional knowledge distillation for distracted driving detection, IEEE Internet Things J., vol. 10, no. 13, pp. 11643–11654, 2023.

Crossref Google Scholar

[33]

E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S. L. Kim, Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data, arXiv preprint arXiv: 1811.11479, 2018.

[34]

C. Wu, F. Wu, L. Lyu, Y. Huang, and X. Xie, Communication-efficient federated learning via knowledge distillation, Nat. Commun., vol. 13, no. 1, p. 2032, 2022.

Crossref Google Scholar

[35]

L. Wang, W. Wang, and B. Li, CMFL: Mitigating communication overhead for federated learning, in Proc. IEEE 39th Int. Conf. Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019, pp. 954–964.

Crossref

[36]

D. Li and J. Wang, FedMD: Heterogenous federated learning via model distillation, arXiv preprint arXiv: 1910.03581, 2019.

[37]

T. Lin, L. Kong, S. U. Stich, and M. Jaggi, Ensemble distillation for robust model fusion in federated learning, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 2351–2363.

[38]

S. Laine and T. Aila, Temporal ensembling for semisupervised learning, arXiv preprint arXiv: 1610.02242, 2016.

[39]

T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: A regularization method for supervised and semi-supervised learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019.

Crossref Google Scholar

[40]

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1195–1204.

[41]

Q. Xie, Z. Dai, E. Hovy, T. Luong, and Q. Le, Unsupervised data augmentation for consistency training, in Proc. 34th Int. Conf. Neural Information Processing Systems, Virtual, 2020, pp. 6256–6268.

[42]

H. Xiao, K. Rasul, and R. Vollgraf, Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms, arXiv preprint arXiv: 1708.07747, 2017.

[43]

N. Yuval, Reading digits in natural images with unsupervised feature learning, in Proc. 25th NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011, pp. 1–9.

[44]

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, Technical report, University of Toronto, Toronto, Canada, 2009.

[45]

S. Zagoruyko and N. Komodakis, Wide residual networks, arXiv preprint arXiv: 1605.07146, 2016.

Tsinghua Science and Technology

Volume 30 Issue 1,
February 2025

Pages 112-123

DOI: 10.26599/TST.2023.9010156

Cite this article:

Shang E, Liu H, Zhang J, et al. Ensemble Knowledge Distillation for Federated Semi-Supervised Image Classification. Tsinghua Science and Technology, 2025, 30(1): 112-123. https://doi.org/10.26599/TST.2023.9010156