PDF (1.7 MB)
Collect
Submit Manuscript
Open Access

GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model

College of Computer Science, and Engineering Research Center of Machine Learning and Industry Intelligence (affiliated to the Ministry of Education), Sichuan University, Chengdu 610065, China
Department of Mechanical Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, USA
Show Author Information

Abstract

The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.

References

[1]

Y. Sun, B. Xue, M. Zhang, and G. G. Yen, Evolving deep convolutional neural networks for image classification, IEEE Trans. Evol. Comput., vol. 24, no. 2, pp. 394–407, 2020.

[2]

L. Zhang, S. Wang, F. Yuan, B. Geng, and M. Yang, Lifelong language learning with adaptive uncertainty regularization, Inf. Sci., vol. 622, pp. 794–807, 2023.

[3]
C. Yu, Y. Wang, C. Tang, W. Feng, and J. Lv, EU-Net: Automatic U-Net neural architecture search with differential evolutionary algorithm for medical image segmentation, Comput. Biol. Med., vol. 167, p. 107579, 2023.
[4]

N. S. Gupta and P. Kumar, Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine, Comput. Biol. Med., vol. 162, p. 107051, 2023.

[5]

S. Belciug, Learning deep neural networks’ architectures using differential evolution. Case study: Medical imaging processing, Comput. Biol. Med., vol. 146, p. 105623, 2022.

[6]

N. A. Baghdadi, A. Malki, S. F. Abdelaliem, H. M. Balaha, M. Badawy, and M. Elhosseini, An automated diagnosis and classification of COVID-19 from chest CT images using a transfer learning-based convolutional neural network, Comput. Biol. Med., vol. 144, p. 105383, 2022.

[7]

J. Mao, X. Yin, G. Zhang, B. Chen, Y. Chang, W. Chen, J. Yu, and Y. Wang, Pseudo-labeling generative adversarial networks for medical image classification, Comput. Biol. Med., vol. 147, p. 105729, 2022.

[8]

C. Tang, C. Yu, Y. Gao, J. Chen, J. Yang, J. Lang, C. Liu, L. Zhong, Z. He, and J. Lv, Deep learning in nuclear industry: A survey, Big Data Mining and Analytics, vol. 5, no. 2, pp. 140–160, 2022.

[9]
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, LoRA: Low-rank adaptation of large language models, in Proc. 10 th Int. Conf. Learning Representations, arXiv preprint arXiv: 2106.09685, 2022.
[10]
Y. Wang, S. Agarwal, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao, AdaMix: Mixture-of-adaptations for parameter-efficient model tuning, in Proc. Conf. Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022, pp. 5744–5760.
[11]

P. Ren, Y. Xiao, X. Chang, P. Y. Huang, Z. Li, X. Chen, and X. Wang, A comprehensive survey of neural architecture search: Challenges and solutions, ACM Comput. Surv., vol. 54, no. 4, p. 76, 2022.

[12]
B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, in Proc. the 5 th Int. Conf. Learning Representations, arXiv preprint arXiv: 1611.01578, 2017.
[13]
B. Baker, O. Gupta, N. Naik, and R. Raskar, Designing neural network architectures using reinforcement learning, in Proc. the 5 th Int. Conf. Learning Representations, arXiv preprint arXiv: 1611.02167, 2017.
[14]

Y. Sun, X. Sun, Y. Fang, G. G. Yen, and Y. Liu, A novel training protocol for performance predictors of evolutionary neural architecture search algorithms, IEEE Trans. Evol. Comput., vol. 25, no. 3, pp. 524–536, 2021.

[15]
H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.
[16]
Z. Zhong, J. Yan, W. Wu, J. Shao, and C. L. Liu, Practical block-wise neural network architecture generation, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 2423–2432.
[17]

T. Elsken, J. H. Metzen, and F. Hutter, Neural architecture search: A survey, J. Mach. Learn. Res., vol. 20, no. 1, pp. 1997–2017, 2019.

[18]
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, Learning transferable architectures for scalable image recognition, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8697–8710.
[19]
L. Xie and A. Yuille, Genetic CNN, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1388−1397.
[20]
M. Suganuma, S. Shirakawa, and T. Nagao, A genetic programming approach to designing convolutional neural network architectures, in Proc. Genetic and Evolutionary Computation Conf., Berlin, Germany, 2017, pp. 497–504.
[21]
T. Zhang, C. Lei, Z. Zhang, X. B. Meng, and C. L. P. Chen, AS-NAS: Adaptive scalable neural architecture search with reinforced evolutionary algorithm for deep learning, IEEE Trans. Evol. Comput., vol. 25, no. 5, pp. 830–841, 2021.
[22]
K. Maziarz, M. Tan, A. Khorlin, M. Georgiev, and A. Gesmundo, Evolutionary-neural hybrid agents for architecture search, arXiv preprint arXiv: 1811.09828, 2018.
[23]
A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf, 2023.
[24]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.

[25]
S. Lu, Y. Wang, P. Yao, C. Li, J. Tan, F. Deng, X. Wang, and C. Song, Conformer space neural architecture search for multi-task audio separation, in Proc. 23 rd Annu. Conf. Int. Speech Communication Association, Incheon, Korea, 2022, pp. 5358–5362.
[26]
L. Pouy, F. Khenfri, P. Leserf, C. Mhraida, and C. Larouci, Open-NAS: A customizable search space for neural architecture search, in Proc. 8 th Int. Conf. Machine Learning Technologies, Stockholm, Sweden, 2023, pp. 102-107.
[27]

C. Jin, J. Huang, T. Wei, and Y. Chen, Neural architecture search based on dual attention mechanism for image classification, Math. Biosci. Eng., vol. 20, no. 2, pp. 2691–2715, 2023.

[28]
H. Nori, N. King, S. M. McKinney, D. Carignan, and E. Horvitz, Capabilities of GPT-4 on medical challenge problems, arXiv preprint arXiv: 2303.13375, 2023.
[29]
W. Hou and Z. Ji, GeneTuring tests GPT models in genomics, bioRxiv, doi: 10.1101/2023.03.11.532238.
[30]
N. Nascimento, C. Tavares, P. Alencar, and D. Cowan, GPT in data science: A practical exploration of model selection, in Proc. 2023 IEEE Int. Conf. Big Data, Sorrento, Italy, 2023, pp. 4325–4334.
[31]
Z. Hu, Y. Dong, K. Wang, K. W. Chang, and Y. Sun, GPT-GNN: Generative pre-training of graph neural networks, in Proc. 26 th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, doi: 10.1145/3394486.3403237.
[32]
M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie, Can GPT-4 perform neural architecture search? arXiv preprint arXiv: 2304.10970, 2023.
[33]
H. Wang, Y. Gao, X. Zheng, P. Zhang, H. Chen, J. Bu, and P. S. Yu, Graph neural architecture search with GPT-4, arXiv preprint arXiv: 2310.01436, 2023.
[34]

Y. Sun, G. G. Yen, B. Xue, M. Zhang, and J. Lv, ArcText: A unified text approach to describing convolutional neural network architectures, IEEE Trans. Artif. Intell., vol. 3, no. 4, pp. 526–540, 2022.

[35]

Y. Sun, B. Xue, M. Zhang, and G. G. Yen, Completely automated CNN architecture design based on blocks, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1242–1254, 2020.

[36]
J. Mellor, J. Turner, A. J. Storkey, and E. J. Crowley, Neural architecture search without training, in Proc. 38 th Int. Conf. Machine Learning, https://proceedings.mlr.press/v139/mellor21a.html, 2023.
[37]
B. Chen, P. Li, B. Li, C. Lin, C. Li, M. Sun, J. Yan, and W. Ouyang, BN-NAS: Neural architecture search with batch normalization, in Proc. IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 307–316.
[38]
B. Baker, O. Gupta, R. Raskar, and N. Naik, Accelerating neural architecture search using performance prediction, in Proc. Int. Conf. Learning Representations, arXiv preprint arXiv: 1705.10823, 2018.
[39]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[40]
M. Tan and Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 6105–6114.
[41]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1–9.
[42]
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár, Designing network design spaces, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 10425–10433.
[43]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 5987–5995.
[44]
N. Ma, X. Zhang, H. T. Zheng, and J. Sun, ShuffleNet V2: Practical guidelines for efficient CNN architecture design, in Proc. 15 th European Conf. Computer Vision, Munich, Germany, 2018, pp. 122–138.
[45]
S. Zagoruyko and N. Komodakis, Wide residual networks, in Proc. the British Machine Vision Conf., arXiv preprint arXiv: 1605.07146, 2016.
[46]
C. Ying, A. Klein, E. Christiansen, E. Real, K. Murphy, and F. Hutter, NAS-bench-101: Towards reproducible neural architecture search, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 7105–7114.
[47]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[48]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31 st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.
Big Data Mining and Analytics
Pages 45-64
Cite this article:
Yu C, Liu X, Wang Y, et al. GPT-NAS: Neural Architecture Search Meets Generative Pre-Trained Transformer Model. Big Data Mining and Analytics, 2025, 8(1): 45-64. https://doi.org/10.26599/BDMA.2024.9020036
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return