🤖 AI Summary
To address the low efficiency of neural architecture search (NAS) caused by excessively large search spaces, this paper proposes GPT-EA, a collaborative search framework. It introduces generative prior knowledge from large language models (LLMs)—specifically GPT—into NAS for the first time, leveraging GPT to generate semantically coherent and structurally valid network components, thereby substantially shrinking the effective search space. This generative prior is then seamlessly integrated with gradient-free optimization via evolutionary algorithms (EA). The method incorporates key techniques including neural architecture encoding and differentiable/zero-cost proxy evaluations. On multiple benchmarks, GPT-EA outperforms seven hand-crafted architectures and thirteen existing NAS methods. Compared to a GPT-free baseline, fine-tuned GPT-EA achieves up to 12% higher accuracy, demonstrating that generative priors simultaneously enhance both the efficiency and quality of neural architecture search.
📝 Abstract
Neural Architecture Search (NAS) has emerged as one of the effective methods to design the optimal neural network architecture automatically. Although neural architectures have achieved human-level performances in several tasks, few of them are obtained from the NAS method. The main reason is the huge search space of neural architectures, making NAS algorithms inefficient. This work presents a novel architecture search algorithm, called GPT-NAS, that optimizes neural architectures by Generative Pre-Trained (GPT) model with an evolutionary algorithm (EA) as the search strategy. In GPT-NAS, we assume that a generative model pre-trained on a large-scale corpus could learn the fundamental law of building neural architectures. Therefore, GPT-NAS leverages the GPT model to propose reasonable architecture components given the basic one and then utilizes EAs to search for the optimal solution. Such an approach can largely reduce the search space by introducing prior knowledge in the search process. Extensive experimental results show that our GPT-NAS method significantly outperforms seven manually designed neural architectures and thirteen architectures provided by competing NAS methods. In addition, our experiments also indicate that the proposed algorithm improves the performance of finely tuned neural architectures by up to about 12% compared to those without GPT, further demonstrating its effectiveness in searching neural architectures.