Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning

Yulin Wang; Jiayi Guo; Jiangshan Wang; Cheng Wu; Shiji Song; Gao Huang

doi:10.26599/AIR.2022.9150011

| Sign up

PDF (754.7 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (3)

Fig. 1

Fig. 2

Fig. 3

Tables (6)

Table 1

Table 2

Table 3

Table 4

Table 5

Original Research | Open Access

Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning

Yulin Wang^¹, Jiayi Guo^¹, Jiangshan Wang^², Cheng Wu^¹, Shiji Song^¹, Gao Huang^¹()

1Department of Automation, BNRist, Tsinghua University, Beijing 100084, China

2Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518131, China

Yulin Wang and Jiayi Guo contribute equally to this work.

Show Author Information

An erratum to this article is available online at:

https://doi.org/10.26599/AIR.2023.9150017

Abstract

Deep learning based semi-supervised learning (SSL) algorithms have led to promising results in recent years. However, they tend to introduce multiple tunable hyper-parameters, making them less practical in real SSL scenarios where the labeled data is scarce for extensive hyper-parameter search. In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL. We start by defining a meta optimization problem that minimizes the loss on labeled data through dynamically reweighting the loss on unlabeled samples, which are associated with soft pseudo labels during training. As the meta problem is computationally intensive to solve directly, we propose an efficient algorithm to dynamically obtain the approximate solutions. We show theoretically that Meta-Semi converges to the stationary point of the loss function on labeled data under mild conditions. Empirically, Meta-Semi outperforms state-of-the-art SSL algorithms significantly on the challenging semi-supervised CIFAR-100 and STL-10 tasks, and achieves competitive performance on CIFAR-10 and SVHN.

Keywords

deep learning semi-supervised learning computer vision

Electronic Supplementary Material

Download File(s)

AI-2022-0014_ESM.pdf (353.3 KB)

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. 25^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NE, USA, 2012, pp. 1097–1105.

[2]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3^rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.

[3]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 1–9.

Crossref

[4]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

Crossref Google Scholar

[5]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770–778.

Crossref

[6]

G. Huang, Z. Liu, G. Pleiss, L. Van Der Maaten, and K. Q. Weinberger, Convolutional networks with dense connectivity, IEEE Trans. Patt. Anal. Mach. Intell., vol. 44, no. 12, pp. 8704–8716, 2019.

Crossref Google Scholar

[7]

M. Wang, H. Li, X. Chen, and Y. Chen, Deep learning-based model reduction for distributed parameter systems, IEEE Trans. Syst. Man Cybernet. Syst., vol. 46, no,12, pp. 1664–1674, 2016.

Crossref Google Scholar

[8]

A. I. Károly, P. Galambos, J. Kuti, and I. J. Rudas, Deep learning in robotics: Survey on model structures and training strategies, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 1, pp. 266–279, 2021.

Crossref Google Scholar

[9]

A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, Realistic evaluation of deep semi-supervised learning algorithms, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 3239–3250.

[10]

V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz, Interpolation consistency training for semi-supervised learning, in Proc. 28^th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 3635–3641.

Crossref

[11]

X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 912–919.

[12]

O. Chapelle, B. Schölkopf, and A. Zien, Semi-Supervised Learning (Adaptive Computation and Machine Learning), Cambridge, MA, USA: MIT Press, 2006.

Crossref

[13]

J. Turian, L. A. Ratinov, and Y. Bengio, Word representations: A simple and general method for semi-supervised learning, in Proc. 48^th Ann. Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 384–394.

[14]

S. Laine and T. Aila, Temporal ensembling for semi-supervised learning, arXiv preprint arXiv: 1610.02242, 2016.

[15]

A. Tarvainen and H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA USA, 2017, pp. 1195–1204.

[16]

T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Trans. Patt. Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019.

Crossref Google Scholar

[17]

D. Berthelot, N. Carlini, I. Goodfellow, A. Oliver, N. Papernot, and C. Raffel, MixMatch: A holistic approach to semi-supervised learning, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 454.

[18]

S. Y. Shin, S. Lee, I. D. Yun, S. M. Kim, and K. M. Lee, Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images, IEEE Trans. Med. Imag., vol. 38, no. 3, pp. 762–774, 2019.

Crossref Google Scholar

[19]

Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng, Semi-supervised medical image classification with relationd-riven self-ensembling model, IEEE Trans. Med. Imag., vol. 39, no. 11, pp. 3429–3440, 2020.

Crossref Google Scholar

[20]

L. Yang, S. Yang, P. Jin, and R. Zhang, Semi-supervised hyperspectral image classification using spatio-spectral Laplacian support vector machine, IEEE Geosc. Remote Sens. Lett., vol. 11, no. 3, pp. 651–655, 2014.

Crossref Google Scholar

[21]

Y. Wu, G. Mu, C. Qin, Q. Miao, W. Ma, and X. Zhang, Semi-supervised hyperspectral image classification via spatial-regulated self-training, Remote Sensing, vol. 12, no. 1, p. 159, 2020.

Crossref Google Scholar

[22]

J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson, Offline/realtime traffic classification using semi-supervised learning, Performance Evaluation, vol. 64, no. 9-12, pp. 1194–1213, 2007.

Crossref Google Scholar

[23]

T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha, Deep learning for classical Japanese literature, arXiv preprint arXiv: 1812.01718, 2018.

[24]

J. Bergstra and Y. Bengio, Random search for hyper-parameter optimization, J Mach. Learn. Res., vol. 13, pp. 281–305, 2012.

Google Scholar

[25]

M. Sajjadi, M. Javanmardi, and T. Tasdizen, Regularization with stochastic transformations and perturbations for deep semi-supervised learning, in Proc. 30^th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1171–1179.

[26]

P. Bachman, O. Alsharif, and D. Precup, Learning with pseudo-ensembles, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3365–3373.

[27]

S. Park, J. Park, S. J. Shin, and I. C. Moon, Adversarial dropout for supervised and semi-supervised learning, in Proc. 32^nd AAAI Conf. Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conf. and Eighth AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 480.

Crossref

[28]

D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, ReMixMatch: Semi-supervised learning with distribution alignment and augmentation anchoring, arXiv preprint arXiv: 1911.09785, 2020.

[29]

T. Joachims, Transductive learning via spectral graph partitioning, in Proc. Twentieth Int. Conf. Int. Conf. Machine Learning, Washington, DC, USA, 2003, pp. 290–297.

[30]

T. Joachims, Transductive inference for text classification using support vector machines, in Proc. 16^th Int. Conf. Machine Learning, San Francisco, CA, USA, 1999, pp. 200–209.

[31]

B. Yoshua, D. Olivier, and R. N. Le, Label propagation and quadratic criterion, in Semi-Supervised Learning, O. Chapelle, B. Scholkopf, A. Zien, eds. Cambridge, MA, USA: MIT Press, 2006, pp. 192–216.

Crossref

[32]

D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling, Semi-supervised learning with deep generative models, in Proc. 27^th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3581–3589.

[33]

A. Odena, Semi-supervised learning with generative adversarial networks, arXiv preprint arXiv: 1606.01583, 2016.

[34]

D. H Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, in Proc. 30^th Int. Conf. Machine Learning, Atlanta, GA, USA, 2013, p. 2.

[35]

Y. Grandvalet and Y. Bengio, Semi-supervised learning by entropy minimization, in Proc. 17^th Int. Conf. Neural Information Processing Systems, Vancouver, British, 2005, pp. 529–536.

[36]

G. He, Y. Pan, X. Xia, J. He, R. Peng, and N. N. Xiong, A fast semi-supervised clustering framework for large-scale time series data, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 7, pp. 4201–4206, 2019.

Crossref Google Scholar

[37]

G. He, B. Li, H. Wang, and W. Jiang, Cost-effective active semi-supervised learning on multivariate time series data with crowds, IEEE Trans. Syst. Man Cybernet. Syst., vol. 52, no. 3, pp. 1437–1450, 2020.

Crossref Google Scholar

[38]

G. Wang, K. W. Wong, and J. Lu, AUC-based extreme learning machines for supervised and semi-supervised imbalanced classification, IEEE Trans. Syst. Man Cybernet. Syst., vol. 51, no. 12, pp. 7919–7930, 2020.

Crossref Google Scholar

[39]

J. Zhao, L. Chen, W. Pedrycz, and W. Wang, A novel semi-supervised sparse Bayesian regression based on variational inference for industrial datasets with incomplete outputs, IEEE Trans. Syst. Man. Cybernet. Syst., vol. 50, no. 11, pp. 4773–4786, 2020.

Crossref Google Scholar

[40]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, Building machines that learn and think like people, Behav. Brain Sci., vol. 40, p. e253, 2017.

Crossref Google Scholar

[41]

M. Andrychowicz, M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, Learning to learn by gradient descent by gradient descent, in Proc. 30^th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3981–3989.

[42]

S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, in 5^th Int. Conf. Learning Representations, Toulon, France, 2017, pp. 1–11.

[43]

M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, Meta-learning for semi-supervised few-shot classification, in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[44]

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34^th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.

[45]

M. Ren, W. Zeng, B. Yang, and R. Urtasun, Learning to reweight examples for robust deep learning, in Proc. 35^th Int. Conf. Machine Learning, Stockholmsmässan, Sweden, 2018, pp. 4334–4343.

[46]

F. F. Li, R. Fergus, and P, Perona, One-shot learning of object categories, IEEE Trans. Patt. Anal. Mach. Intell., vol. 28, no. 4, pp. 594–611, 2006.

Crossref Google Scholar

[47]

Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y. Yang, Learning to propagate labels: Transductive propagation network for few-shot learning, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[48]

X. Li, Q. Sun, Y. Liu, S. Zheng, Q. Zhou, T. S. Chua, and B. Schiele, Learning to self-train for semi-supervised few-shot classification, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, p. 922.

[49]

Z. Yu, L. Chen, Z. Cheng, and J. Luo, TransMatch: A transfer-learning scheme for semi-supervised few-shot learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 12853–12861.

Crossref

[50]

P. Rodríguez, I. Laradji, A. Drouin, and A. Lacoste, Embedding propagation: Smoother manifold for few-shot classification, in 16^th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 121–138.

Crossref

[51]

A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, Meta-learning with latent embedding optimization, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[52]

Q. Sun, Y. Liu, T. S. Chua, and B. Schiele, Meta-transfer learning for few-shot learning, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019, pp. 403–412.

Crossref

[53]

H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. Mixup: Beyond empirical risk minimization. in Proc. 6^th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[54]

S. J. Reddi, A. Hefny, S. Sra, B. Póczós, and A. Smola, Stochastic variance reduction for nonconvex optimization, in Proc. 33^rd Int. Conf. Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 314–323.

[55]

A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images,https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf, 2009.

[56]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, Reading digits in natural images with unsupervised feature learning, presented on 25^th Conf. Neural Information Processing Systems Workshop on Deep Learning and Unsupervised Feature Learning, Granada, Spain, 2011.

[57]

A. Coates, A. Ng, and H. Lee, An analysis of single-layer networks in unsupervised feature learning, in Proc. Fourteenth Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2011, pp. 215–223.

[58]

B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson, There are many consistent explanations of unlabeled data: Why you should average, in Proc. 7^th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[59]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37^th Int. Conf. Machine Learning, virtual, 2020, p. 149.

[60]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.

Crossref

[61]

X. Chen and K. He, Exploring simple Siamese representation learning, in 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 15745–15753.

Crossref

[62]

Y. Luo, J. Zhu, M. Li, Y. Ren, and B. Zhang, Smooth neighbors on teacher graphs for semi-supervised learning, in 2018 IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8896–8905.

Crossref

[63]

I. Loshchilov and F. Hutter, SGDR: Stochastic gradient descent with warm restarts, in Proc. 5^th Int. Conf. Learning Representations, Toulon, France, 2016.

[64]

G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, Snapshot ensembles: Train 1, get m for free, in Proc. 5^th Int. Conf. Learning Representations, Toulon, France, 2017.

[65]

J. Zhao, M. Mathieu, R. Goroshin, and Y. LeCun, Stacked what-where auto-encoders, arXiv preprint arXiv: 1506.02351, 2015.

[66]

E. Denton, S. Gross, and R. Fergus, Semi-supervised learning with context-conditional generative adversarial networks, arXiv preprint arXiv: 1611.06430, 2016.

[67]

K. Lee, S. Maji, A. Ravichandran, and S. Soatto, Meta-learning with differentiable convex optimization, in 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 10649–10657.

Crossref

[68]

Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, Unsupervised feature learning via non-parametric instance discrimination, in 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3733–3742.

Crossref

[69]

W. Huang, M. Yi, and X. Zhao, Towards the generalization of contrastive self-supervised learning, arXiv preprint arXiv: 2111.00743, 2021.

[70]

J. Li, C. Xiong, and S. C. H. Hoi, CoMatch: Semi-supervised learning with contrastive graph regularization, in 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9455–9464.

Crossref

[71]

D. P. Bertsekas, Nonlinear Programming 2nd ed., Belmont, WY, USA: Athena Scientific, 1999.

CAAI Artificial Intelligence Research

Volume 1 Issue 2,
December 2022

Pages 161-171

DOI: 10.26599/AIR.2022.9150011

Cite this article:

Wang Y, Guo J, Wang J, et al. Meta-Semi: A Meta-Learning Approach for Semi-Supervised Learning. CAAI Artificial Intelligence Research, 2022, 1(2): 161-171. https://doi.org/10.26599/AIR.2022.9150011

Return

Table 1Performance of Meta-Semi and state-of-the-art SSL algorithms on CIFAR with varying amount of labeled data. We report the average test errors and the standard deviations of 5 trials. $†$ and $‡$ refer to the experiments using ResNet-18 and WRN-28-2, while all the others use CNN-13. In each setting, the best two results with CNN-13 and the best result with WRN-28-2 are bold-faced. (%)

Dataset	CIFAR-10				CIFAR-100
Dataset	0 labels	1000 labels	2000 labels	4000 labels	4000 labels	10000 labels
Supervised learning	–	39.95±0.75	27.67±0.12	20.42±0.21	58.31±0.89	44.56±0.30
Supervised learning + MixUp^[53]	–	31.83±0.65	24.22±0.15	17.37±0.35	54.87±0.07	40.97±0.47
Self-supervised learning (SimCLR) $†$ ^[59]	10.24±0.12	–	–	–	–	–
Self-supervised learning (MoCo) $†$ ^[60]	9.88±0.12	–	–	–	–	–
Self-supervised learning (SimSiam) $†$ ^[61]	9.41±0.11	–	–	–	–	–
$Π$ -model^[14]	–	28.74±0.48	17.57±0.44	12.36±0.17	55.39±0.55	38.06±0.37
Temp-ensemble^[14]	–	25.15±1.46	15.78±0.44	11.90±0.25	–	38.65±0.51
Mean Teacher^[15]	–	18.27±0.53	13.45±0.30	10.73±0.14	45.36±0.49	35.96±0.77
VAT^[16]	–	18.12±0.82	13.93±0.33	11.10±0.24	–	–
SNTG^[62]	–	18.41±0.52	13.64±0.32	10.93±0.14	–	37.97±0.29
Learning to Reweight^[45]	–	11.74±0.12	–	9.44±0.17	46.62±0.29	37.31±0.47
MT + Fast SWA^[58]	–	15.58	11.02	9.05	–	33.62±0.54
ICT^[10]	–	12.44±0.57	8.69±0.15	7.18±0.24	40.07±0.38	32.24±0.16
Meta-Semi	–	10.27±0.66	8.42±0.30	7.05±0.27	37.61±0.56	30.51±0.32
Meta-Semi + ICT	–	9.29±0.62	7.05±0.12	6.42±0.18	37.12±0.59	29.68±0.05
Mean Teacher $‡$ ^[15]	–	17.32±4.00	12.17±0.22	10.36±0.25	–	–
MixMatch $‡$ ^[17]	–	7.75±0.32	7.03±0.15	6.24±0.06	–	30.84±0.29
Meta-Semi $‡$	–	7.34±0.22	6.58±0.07	6.10±0.10	–	29.69±0.18

Algorithm 1 Meta-Semi Algorithm
1: Initialize: $θ^{0}$
2: for $t = 1$ to $T$ do
3:　　Randomly sample $X$ , $U$
4:　　Generate $\tilde{X}$ , $\tilde{U}$
5:　　Compute $p ({\tilde{u}}_{j}; θ^{t})$ , ${\tilde{u}}_{j} \in \tilde{U}$
6:　　 $w \leftarrow 0$ , ${\bar{θ}}_{0}^{t} \leftarrow θ^{t}$
7:　　 $\nabla_{{\bar{θ}}_{0}^{t}} \leftarrow \frac{∂ \sum_{j = 1}^{\| \tilde{U} \|} w_{j} L ({\hat{y}}_{j}, p ({\tilde{u}}_{j}; θ^{t}))}{∂ θ^{t}}$
8:　　 ${\bar{θ}}_{1}^{t} \leftarrow {\bar{θ}}_{0}^{t} - α^{t} \nabla_{{\bar{θ}}_{0}^{t}}$
9:　　Compute $p ({\tilde{x}}_{i}; {\bar{θ}}_{1}^{t})$ , ${\tilde{x}}_{i} \in \tilde{X}$
10:　　Meta gradient: $\nabla_{w}^{t} \leftarrow \frac{∂ \sum_{i = 1}^{\| \tilde{X} \|} L ({\tilde{y}}_{i}, p ({\tilde{x}}_{i}; {\bar{θ}}_{1}^{t}))}{∂ w}$
11:　　 $w^{t} \leftarrow s i g n (m a x (- \nabla_{w}^{t}, 0))$ (Eq. (9))
12:　　 $L_{m e t a} \leftarrow \frac{1}{\sum_{j = 1}^{\| \tilde{U} \|} w_{j}^{t}} \sum_{j = 1}^{\| \tilde{U} \|} w_{j}^{t} L ({\hat{y}}_{j}, p ({\tilde{u}}_{j}; θ^{t}))$
13:　　 $θ^{(t + 1)} \leftarrow θ^{t} - α^{t} \frac{∂ L_{m e t a}}{∂ θ^{t}}$
14: end for

Table 2Test errors on STL-10. We adopt the same experimental setups as Ref. [17]. The best result is bold-faced. (%)

Method	STL-10
Method	1000 labels
SWWAE^[65]	25.70
CC-GAN^[66]	22.20
MixMatch^[17]	10.18±1.46
Meta-Semi	8.03±0.24

Table 3Test errors on SVHN with varying amount of labeled data. We report the average results and the standard deviations of 5 independent experiments. All results are based on CNN-13. The best results are bold-faced. (%)

Method	SVHN
Method	500 labels	1000 labels
VAT^[16]	–	5.42
$Π$ -model^[14]	6.65±0.53	4.82±0.17
Temp-ensemble^[14]	5.12±0.13	4.42±0.16
Mean teacher^[15]	4.18±0.27	3.95±0.19
ICT^[10]	4.23±0.15	3.89±0.04
SNTG^[62]	3.99±0.24	3.86±0.27
Meta-Semi	4.12±0.21	3.92±0.11
Meta-Semi + ICT	3.98±0.09	3.77±0.05

Table 4Test accuracy of semi-supervised 5-way few-shot learning on mini-ImageNet and tiered-ImageNet. The experimental setups in Refs. [43, 47, 48] are adopted for all the experiments. In 5-way 1/5-shot learning, 30/50 unlabeled samples are included in the unlabeled set for each novel class. The best results are bold-faced. (%)

Method	Backbone	mini-ImageNet (5-way)		tiered-ImageNet (5-way)
Method	Backbone	1-shot (30)	5-shot (50)	1-shot (30)	5-shot (50)
Masked Soft k-Means^[43]	CONV-4	50.40	64.40	52.40	69.90
TPN^[47]	CONV-4	52.78	66.42	55.74	71.01
TransMatch^[49]	WRN-28-10	60.02	79.30	72.19	82.12
LEO^[51]	WRN-28-10	61.76	77.59	66.33	81.44
MTL^[52]	ResNet-12	61.20	75.50	65.60	78.60
Masked soft k-Means^[43] + MTL^[52]	ResNet-12	62.10	73.60	68.60	81.00
TPN^[47] + MTL^[52]	ResNet-12	62.70	74.20	72.10	83.30
MetaOpt-SVM^[67]	ResNet-12	62.64	78.63	65.99	81.56
LST^[48]	ResNet-12	70.10	78.70	77.70	85.20
EPNet^[50]	ResNet-12	69.93	80.23	79.29	86.03
EPNet^[50] + Meta-Semi	ResNet-12	73.24	83.30	81.30	88.08

Table 5Performance of Meta-Semi vs. baselines with fixed amount of training time. We report the mean test errors of both networks on CIFAR-100 with 10000 labels. The best results are bold-faced. (%)

Method	CNN-13				WRN-28-2
Method	5.0 h	7.5 h	10.0 h	12.6 h	13.7 h	18.3 h	22.8 h	29.2 h
ICT^[10]	33.43	32.84	32.61	32.24	–	–	–	–
MixMatch^[17]	–	–	–	–	32.94	31.91	31.26	30.84
Meta-Semi	32.73	31.81	31.06	30.84	31.74	30.85	30.50	30.13

Table 6Ablation study results. We report the test errors on CIFAR-100 with 4000 and 10000 labels. The CNN-13 network is used. (%)

Ablation	CIFAR-100
Ablation	4000 labels	10000 labels
Without parameter EMA	47.68±0.27	37.15±1.02
One-hot pseudo labels	41.52±0.51	32.78±0.41
MixUp on unlabeled data only	37.69±0.50	30.56±0.39
MixUp on labeled data only	45.90±0.15	36.11±0.21
Without MixUp	46.71±0.05	35.98±0.69
Reweighting with the constant 1	40.26±0.64	32.17±0.14
Reweighting with −1 and 1	45.41±0.38	36.39±0.44
Meta-Semi	37.61±0.56	30.51±0.32
Meta-Semi + ICT	37.12±0.59	29.68±0.05