🤖 AI Summary
In neural architecture search (NAS), variational autoencoders (VAEs) map discrete architectures to continuous latent spaces, leading to excessive invalid or duplicate samples. To address this, we propose a VQ-VAE–based discrete representation method that learns discrete latent code sequences for architectures and encodes them as numerical sequences to enable end-to-end generation of valid architectures by large language models (LLMs). Our approach abandons the continuous latent space assumption and is the first to integrate VQ-VAE with LLMs for NAS, enabling learnable, truly discrete architectural prior modeling compatible with any sequence-generation model. On NAS-Bench-101 and NAS-Bench-201, our method improves the rate of generating valid and unique architectures by over 80% and 8%, respectively, demonstrating its effectiveness and generalizability in sequence-driven NAS.
📝 Abstract
Unsupervised representation learning has been widely explored across various modalities, including neural architectures, where it plays a key role in downstream applications like Neural Architecture Search (NAS). These methods typically learn an unsupervised representation space before generating/ sampling architectures for the downstream search. A common approach involves the use of Variational Autoencoders (VAEs) to map discrete architectures onto a continuous representation space, however, sampling from these spaces often leads to a high percentage of invalid or duplicate neural architectures. This could be due to the unnatural mapping of inherently discrete architectural space onto a continuous space, which emphasizes the need for a robust discrete representation of these architectures. To address this, we introduce a Vector Quantized Variational Autoencoder (VQ-VAE) to learn a discrete latent space more naturally aligned with the discrete neural architectures. In contrast to VAEs, VQ-VAEs (i) map each architecture into a discrete code sequence and (ii) allow the prior to be learned by any generative model rather than assuming a normal distribution. We then represent these architecture latent codes as numerical sequences and train a text-to-text model leveraging a Large Language Model to learn and generate sequences representing architectures. We experiment our method with Inception/ ResNet-like cell-based search spaces, namely NAS-Bench-101 and NAS-Bench-201. Compared to VAE-based methods, our approach improves the generation of valid and unique architectures by over 80% on NASBench-101 and over 8% on NASBench-201. Finally, we demonstrate the applicability of our method in NAS employing a sequence-modeling-based NAS algorithm.