Resource-Efficient Iterative LLM-Based NAS with Feedback Memory

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work proposes a closed-loop, large language model (LLM)-driven neural architecture search (NAS) framework that significantly reduces computational costs, enabling efficient execution on a single consumer-grade GPU. By leveraging frozen, instruction-tuned LLMs with ≤7B parameters—such as Qwen2.5, DeepSeek-Coder, and GLM-5—the method rapidly evaluates candidate architectures via a single training epoch of a proxy model. A Markov chain–based sliding-window memory mechanism encodes failed search trajectories into structured diagnostic triplets to guide subsequent exploration. The framework employs a dual-LLM division-of-labor strategy to mitigate cognitive load and implicitly favors compact, edge-device-friendly architectures. On CIFAR-10, the approach boosts the accuracy of models generated by Qwen2.5-7B from 50.0% to 71.5% within approximately 18 GPU hours (RTX 4090) over 2,000 iterations, demonstrating the feasibility of low-budget, reproducible, and efficient NAS.

Technology Category

Application Category

📝 Abstract

Neural Architecture Search (NAS) automates network design, but conventional methods demand substantial computational resources. We propose a closed-loop pipeline leveraging large language models (LLMs) to iteratively generate, evaluate, and refine convolutional neural network architectures for image classification on a single consumer-grade GPU without LLM fine-tuning. Central to our approach is a historical feedback memory inspired by Markov chains: a sliding window of $K{=}5$ recent improvement attempts keeps context size constant while providing sufficient signal for iterative learning. Unlike prior LLM optimizers that discard failure trajectories, each history entry is a structured diagnostic triple -- recording the identified problem, suggested modification, and resulting outcome -- treating code execution failures as first-class learning signals. A dual-LLM specialization reduces per-call cognitive load: a Code Generator produces executable PyTorch architectures while a Prompt Improver handles diagnostic reasoning. Since both the LLM and architecture training share limited VRAM, the search implicitly favors compact, hardware-efficient models suited to edge deployment. We evaluate three frozen instruction-tuned LLMs (${\leq}7$B parameters) across up to 2000 iterations in an unconstrained open code space, using one-epoch proxy accuracy on CIFAR-10, CIFAR-100, and ImageNette as a fast ranking signal. On CIFAR-10, DeepSeek-Coder-6.7B improves from 28.2% to 69.2%, Qwen2.5-7B from 50.0% to 71.5%, and GLM-5 from 43.2% to 62.0%. A full 2000-iteration search completes in ${\approx}18$ GPU hours on a single RTX~4090, establishing a low-budget, reproducible, and hardware-aware paradigm for LLM-driven NAS without cloud infrastructure.

Problem

Research questions and friction points this paper is trying to address.

Neural Architecture Search

Resource Efficiency

Large Language Models

Hardware-Aware Search

Feedback Memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based NAS

Feedback Memory

Resource-Efficient Search