Convergence Theory for Iterative LLM-Based Neural Architecture Search: A Parametric Cross-Entropy Framework with Closed-Form Proxy Reliability

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the lack of theoretical convergence guarantees in existing large language model (LLM)-based iterative neural architecture search (NAS) methods. It formulates LLM-NAS for the first time as a parameterized cross-entropy method over the space of executable programs and establishes its convergence theory, proving that architectural quality is monotonically non-decreasing and that the elite set probability converges geometrically. A closed-form formula for proxy reliability is also derived. The proposed approach integrates LLM fine-tuning, δ-code generation, and MinHash-Jaccard novelty filtering. Extensive experiments across three LLMs, six datasets, and 3,300 generated architectures quantitatively validate two theoretical predictions and directionally support two others, while revealing a ceiling effect in proxy reliability.

📝 Abstract

Large language models (LLMs) are increasingly used as generators in iterative neural architecture search (NAS), yet no formal convergence theory exists for this class of algorithms. We model iterative LLM-NAS as a parametric Cross-Entropy (CE) method over executable programs and prove six results: (1) iterative LLM fine-tuning on elite architectures is equivalent to the CE update restricted to the LLM parametric family; (2) expected architecture quality is monotonically non-decreasing across cycles; (3) elite-set probability converges to a fixed point at a geometric rate C_t >= 1-(1-rho_0)^t; (4) delta-based generation achieves a strictly higher valid-generation rate than full-code generation under a first-order Markov token-error model; (5) the MinHash-Jaccard novelty filter prevents mode collapse; (6) proxy reliability admits the closed-form rho_S = (6/pi) arcsin(rho_P(SNR)/2), yielding the practical diagnostic sigma^2_arch >> sigma^2_noise as a necessary condition for trustworthy proxy-based rankings. Testing against a 22-cycle, three-LLM, six-dataset experiment with 3,300 generated architectures confirms two predictions quantitatively, two at direction-of-effect level, and explains the proxy-reliability ceiling effect previously reported empirically but left unexplained.

Problem

Research questions and friction points this paper is trying to address.

Neural Architecture Search

Large Language Models

Convergence Theory

Cross-Entropy Method

Proxy Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Convergence Theory

LLM-based NAS

Parametric Cross-Entropy

Proxy Reliability

Elite-set Convergence

🔎 Similar Papers

No similar papers found.

ServiceNow

Mountain View, CALIFORNIA, US

Research Engineer - Perception and Machine Learning