TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search

๐Ÿ“… 2024-03-30
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing zero-shot neural architecture search (NAS) proxy metrics suffer from poor generalization, reliance on simplistic statistics, or dependence on ground-truth labelsโ€”hindering cross-search-space transferability. To address this, we propose the first training-free, label-free, and data-agnostic zero-cost performance prediction framework. Our method jointly leverages a Transformer-based operator encoder and a graph convolutional network (GCN) to perform end-to-end zero-cost scoring of arbitrary neural architecture graphs. Crucially, it requires no retraining to adapt to novel operators or unseen search spaces. Evaluated on NAS-Bench-201, it identifies architectures achieving 93.75% CIFAR-10 test accuracy; on the DARTS search space, it discovers models attaining 74.5% ImageNet top-1 accuracy. Moreover, it achieves 300ร— higher search efficiency than state-of-the-art zero-cost methods. The framework significantly advances cross-space generalization and practical deployability of zero-shot NAS.

Technology Category

Application Category

๐Ÿ“ Abstract
Neural architecture search (NAS) is an effective method for discovering new convolutional neural network (CNN) architectures. However, existing approaches often require time-consuming training or intensive sampling and evaluations. Zero-shot NAS aims to create training-free proxies for architecture performance prediction. However, existing proxies have suboptimal performance, and are often outperformed by simple metrics such as model parameter counts or the number of floating-point operations. Besides, existing model-based proxies cannot be generalized to new search spaces with unseen new types of operators without golden accuracy truth. A universally optimal proxy remains elusive. We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance. This approach guides neural architecture search across any given search space without the need of retraining. Distinct from other model-based predictor subroutines, TG-NAS itself acts as a zero-cost (ZC) proxy, guiding architecture search with advantages in terms of data independence, cost-effectiveness, and consistency across diverse search spaces. Our experiments showcase its advantages over existing proxies across various NAS benchmarks, suggesting its potential as a foundational element for efficient architecture search. TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods. Notably, it discovers competitive models with 93.75% CIFAR-10 accuracy on the NAS-Bench-201 space and 74.5% ImageNet top-1 accuracy on the DARTS space.
Problem

Research questions and friction points this paper is trying to address.

Existing zero-shot NAS proxies perform poorly and lack generalizability.
Current proxies fail to adapt to new operators without accuracy data.
TG-NAS aims to provide a universal, efficient, and robust NAS proxy.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based operator embedding generator
Graph Convolutional Network for performance prediction
No retraining needed across search spaces