Arch-LLM: Taming LLMs for Neural Architecture Generation via Unsupervised Discrete Representation Learning

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In neural architecture search (NAS), variational autoencoders (VAEs) map discrete architectures to continuous latent spaces, leading to excessive invalid or duplicate samples. To address this, we propose a VQ-VAE–based discrete representation method that learns discrete latent code sequences for architectures and encodes them as numerical sequences to enable end-to-end generation of valid architectures by large language models (LLMs). Our approach abandons the continuous latent space assumption and is the first to integrate VQ-VAE with LLMs for NAS, enabling learnable, truly discrete architectural prior modeling compatible with any sequence-generation model. On NAS-Bench-101 and NAS-Bench-201, our method improves the rate of generating valid and unique architectures by over 80% and 8%, respectively, demonstrating its effectiveness and generalizability in sequence-driven NAS.

Technology Category

Application Category

📝 Abstract
Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.
Problem

Research questions and friction points this paper is trying to address.

Addresses invalid/duplicate architectures in continuous VAE spaces
Proposes discrete VQ-VAE for neural architecture representation
Enhances NAS by 80%+ valid/unique architecture generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vector Quantized Variational Autoencoder for discrete representation
Leverages Large Language Model for sequence generation
Improves valid architecture generation by over 80%
🔎 Similar Papers
No similar papers found.