🤖 AI Summary
This work addresses the lack of theoretical convergence guarantees in existing large language model (LLM)-based iterative neural architecture search (NAS) methods. It formulates LLM-NAS for the first time as a parameterized cross-entropy method over the space of executable programs and establishes its convergence theory, proving that architectural quality is monotonically non-decreasing and that the elite set probability converges geometrically. A closed-form formula for proxy reliability is also derived. The proposed approach integrates LLM fine-tuning, δ-code generation, and MinHash-Jaccard novelty filtering. Extensive experiments across three LLMs, six datasets, and 3,300 generated architectures quantitatively validate two theoretical predictions and directionally support two others, while revealing a ceiling effect in proxy reliability.
📝 Abstract
Large language models (LLMs) are increasingly used as generators in iterative neural architecture search (NAS), yet no formal convergence theory exists for this class of algorithms. We model iterative LLM-NAS as a parametric Cross-Entropy (CE) method over executable programs and prove six results: (1) iterative LLM fine-tuning on elite architectures is equivalent to the CE update restricted to the LLM parametric family; (2) expected architecture quality is monotonically non-decreasing across cycles; (3) elite-set probability converges to a fixed point at a geometric rate C_t >= 1-(1-rho_0)^t; (4) delta-based generation achieves a strictly higher valid-generation rate than full-code generation under a first-order Markov token-error model; (5) the MinHash-Jaccard novelty filter prevents mode collapse; (6) proxy reliability admits the closed-form rho_S = (6/pi) arcsin(rho_P(SNR)/2), yielding the practical diagnostic sigma^2_arch >> sigma^2_noise as a necessary condition for trustworthy proxy-based rankings. Testing against a 22-cycle, three-LLM, six-dataset experiment with 3,300 generated architectures confirms two predictions quantitatively, two at direction-of-effect level, and explains the proxy-reliability ceiling effect previously reported empirically but left unexplained.