L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zero-cost (ZC) proxies—widely used in CNN-based neural architecture search (NAS)—fail on Vision Transformers (ViTs) due to structural disparities, and individual ZC proxies provide limited, noisy signals. Method: We propose the first ViT-specific ZC-NAS framework: (1) L-SWAG, a layer- and sample-wise activation gradient metric that unifies ZC signal modeling across CNNs and ViTs; (2) LIBRA-NAS, which dynamically fuses multiple ZC proxies via low-information-gain detection and bias realignment. Contribution/Results: On ImageNet-1K, our method discovers a high-performance ViT with only 0.1 GPU-day cost and 17.0% top-1 error. It demonstrates strong generalization across six Autoformer benchmarks, significantly outperforming evolutionary and gradient-based NAS baselines. The framework achieves unprecedented efficiency, architectural agnosticism, and interpretability—enabling reliable ZC proxy usage for ViT search without training or validation.

Technology Category

Application Category

📝 Abstract
Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies. Unlike multi-shot and one-shot NAS approaches, ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded. Despite rapid developments in the field, current SOTA ZC proxies are typically constrained to well-established convolutional search spaces. With the rise of Large Language Models shaping the future of deep learning, this work extends ZC proxy applicability to Vision Transformers (ViTs). We present a new benchmark using the Autoformer search space evaluated on 6 distinct tasks and propose Layer-Sample Wise Activation with Gradients information (L-SWAG), a novel, generalizable metric that characterizes both convolutional and transformer architectures across 14 tasks. Additionally, previous works highlighted how different proxies contain complementary information, motivating the need for a ML model to identify useful combinations. To further enhance ZC-NAS, we therefore introduce LIBRA-NAS (Low Information gain and Bias Re-Alignment), a method that strategically combines proxies to best represent a specific benchmark. Integrated into the NAS search, LIBRA-NAS outperforms evolution and gradient-based NAS techniques by identifying an architecture with a 17.0% test error on ImageNet1k in just 0.1 GPU days.
Problem

Research questions and friction points this paper is trying to address.

Extends zero-cost NAS to Vision Transformers for efficient architecture search
Proposes L-SWAG metric to evaluate convolutional and transformer architectures
Introduces LIBRA-NAS to optimize proxy combinations for better NAS performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free NAS with zero-cost proxies
L-SWAG metric for ViTs and CNNs
LIBRA-NAS combines proxies strategically
🔎 Similar Papers
No similar papers found.