🤖 AI Summary
This work proposes a large language model (LLM)-driven neural architecture search method to address the challenge of channel configuration optimization in deep neural networks, which is constrained by tensor shape compatibility and computational budgets. The approach formulates channel configuration as a conditional code generation task guided by performance feedback and leverages abstract syntax tree (AST) mutation to synthesize a large corpus of structurally valid architectures, mitigating data scarcity. For the first time, an LLM is employed to learn priors over non-standard channel configurations from this large-scale synthetic data, transcending conventional heuristic limitations and enabling deeper understanding of architectural design patterns. Evaluated on CIFAR-100, the method achieves statistically significant accuracy improvements, demonstrating that LLMs can effectively acquire domain-specific architectural priors and outperform baseline strategies such as random search.
📝 Abstract
Channel configuration search the optimization of layer specifications such as layer widths in deep neural networks presents a complex combinatorial challenge constrained by tensor shape compatibility and computational budgets. We posit that Large Language Models (LLMs) offer a transformative approach to Neural Architecture Search (NAS), capable of reasoning about architectural code structure in ways that traditional heuristics cannot. In this paper, we investigate the application of an LLM-driven NAS framework to the problem of channel configuration. We formulate the search as a sequence of conditional code generation tasks, where an LLM refines architectural specifications based on performance telemetry. Crucially, we address the data scarcity problem by generating a vast corpus of valid, shape-consistent architectures via Abstract Syntax Tree (AST) mutations. While these mutated networks are not necessarily high-performing, they provide the critical volume of structural data required for the LLM to learn the latent relationship between channel configurations and model performance. This allows the LLM to internalize complex design patterns and apply them to optimize feature extraction strategies. Experimental results on CIFAR-100 validate the efficacy of this approach, demonstrating that the model yields statistically significant improvements in accuracy. Our analysis confirms that the LLM successfully acquires domain-specific architectural priors, distinguishing this method from random search and highlighting the immense potential of language-driven design in deep learning.