The pursuit of optimal neural network architectures is foundational to the progression of Neural Architecture Search (NAS). However, the existing NAS methods suffer from the following problem using traditional search strategies, i.e., when facing a large and complex search space, it is difficult to mine more effective architectures within a reasonable time, resulting in inferior search results. This research introduces the Generative Pre-trained Transformer NAS (GPT-NAS), an innovative approach designed to overcome the limitations which are inherent in traditional NAS strategies. This approach improves search efficiency and obtains better architectures by integrating GPT model into the search process. Specifically, we design a reconstruction strategy that utilizes the trained GPT to reorganize the architectures obtained from the search. In addition, to equip the GPT model with the design capabilities of neural architecture, we propose the use of the GPT model for training on a neural architecture dataset. For each architecture, the structural information of its previous layers is utilized to predict the next layer of structure, iteratively traversing the entire architecture. In this way, the GPT model can efficiently learn the key features required for neural architectures. Extensive experimental validation shows that our GPT-NAS approach beats both manually constructed neural architectures and automatically generated architectures by NAS. In addition, we validate the superiority of introducing the GPT model in several ways, and find that the accuracy of the neural architecture on the image dataset obtained from the search after introducing the GPT model is improved by up to about 9%.
Y. Sun, B. Xue, M. Zhang, and G. G. Yen, Evolving deep convolutional neural networks for image classification, IEEE Trans. Evol. Comput., vol. 24, no. 2, pp. 394–407, 2020.
L. Zhang, S. Wang, F. Yuan, B. Geng, and M. Yang, Lifelong language learning with adaptive uncertainty regularization, Inf. Sci., vol. 622, pp. 794–807, 2023.
N. S. Gupta and P. Kumar, Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine, Comput. Biol. Med., vol. 162, p. 107051, 2023.
S. Belciug, Learning deep neural networks’ architectures using differential evolution. Case study: Medical imaging processing, Comput. Biol. Med., vol. 146, p. 105623, 2022.
N. A. Baghdadi, A. Malki, S. F. Abdelaliem, H. M. Balaha, M. Badawy, and M. Elhosseini, An automated diagnosis and classification of COVID-19 from chest CT images using a transfer learning-based convolutional neural network, Comput. Biol. Med., vol. 144, p. 105383, 2022.
J. Mao, X. Yin, G. Zhang, B. Chen, Y. Chang, W. Chen, J. Yu, and Y. Wang, Pseudo-labeling generative adversarial networks for medical image classification, Comput. Biol. Med., vol. 147, p. 105729, 2022.
C. Tang, C. Yu, Y. Gao, J. Chen, J. Yang, J. Lang, C. Liu, L. Zhong, Z. He, and J. Lv, Deep learning in nuclear industry: A survey, Big Data Mining and Analytics, vol. 5, no. 2, pp. 140–160, 2022.
P. Ren, Y. Xiao, X. Chang, P. Y. Huang, Z. Li, X. Chen, and X. Wang, A comprehensive survey of neural architecture search: Challenges and solutions, ACM Comput. Surv., vol. 54, no. 4, p. 76, 2022.
Y. Sun, X. Sun, Y. Fang, G. G. Yen, and Y. Liu, A novel training protocol for performance predictors of evolutionary neural architecture search algorithms, IEEE Trans. Evol. Comput., vol. 25, no. 3, pp. 524–536, 2021.
T. Elsken, J. H. Metzen, and F. Hutter, Neural architecture search: A survey, J. Mach. Learn. Res., vol. 20, no. 1, pp. 1997–2017, 2019.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
C. Jin, J. Huang, T. Wei, and Y. Chen, Neural architecture search based on dual attention mechanism for image classification, Math. Biosci. Eng., vol. 20, no. 2, pp. 2691–2715, 2023.
Y. Sun, G. G. Yen, B. Xue, M. Zhang, and J. Lv, ArcText: A unified text approach to describing convolutional neural network architectures, IEEE Trans. Artif. Intell., vol. 3, no. 4, pp. 526–540, 2022.
Y. Sun, B. Xue, M. Zhang, and G. G. Yen, Completely automated CNN architecture design based on blocks, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1242–1254, 2020.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.