Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Existing zero-shot evaluation metrics suffer from poor generalization and low correlation with actual model performance. This work proposes SWAP-Score, a training-free zero-shot evaluation method that leverages activation patterns from small batches of unlabeled data. SWAP-Score is the first unified metric applicable to both CNN and Transformer architectures and supports cross-modal assessment of model potential across vision and language tasks without labels. It achieves a Spearman correlation of 0.93 with CIFAR-10 accuracy in the DARTS search space and 0.71 on GLUE tasks using FlexiBERT. Furthermore, SWAP-NAS, a neural architecture search framework built upon SWAP-Score, efficiently completes architecture search in only 6–9 minutes of GPU time.
📝 Abstract
Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CNNs) or Transformers, but not both. To address these limitations, we propose Sample-Wise Activation Patterns (SWAP), and its derivative, SWAP-Score, a novel and highly effective zero-shot metric. SWAP-Score is broadly applicable across both architecture families and task domains, demonstrating strong predictive performance in the majority of tasks. This metric measures the expressivity of neural networks over a mini-batch of samples, showing a high correlation with the neural networks' ground-truth performance. For both CNNs and Transformers, the SWAP-Score outperforms existing zero-shot metrics across computer vision and natural language processing tasks. For instance, Spearman's correlation coefficient between the SWAP-Score and CIFAR-10 validation accuracy for DARTS CNNs is 0.93, and 0.71 for FlexiBERT Transformers on GLUE tasks. Moreover, SWAP-Score is label-independent, hence can be applied at the pre-training stage of language models to estimate their performance for downstream tasks. When applied to NAS, SWAP-empowered NAS, SWAP-NAS can achieve competitive performance using only approximately 6 and 9 minutes of GPU time, on CIFAR-10 and ImageNet respectively. Our code is available at: https://github.com/pym1024/SWAP_Universal
Problem

Research questions and friction points this paper is trying to address.

zero-shot evaluation
neural architecture search
generalization
cross-architecture
performance correlation
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot evaluation
neural architecture search
activation patterns
cross-architecture generalization
training-free metric
Y
Yameng Peng
School of Computing Technologies, RMIT University, Australia
Andy Song
Andy Song
A/Prof of AI, School of Computing Technologies, CIAIRI, RMIT University
Artificial IntelligenceEvolutionary ComputationPattern RecognitionOptimizationComputer Vision
H
Haytham M. Fayek
School of Computing Technologies, RMIT University, Australia
V
Vic Ciesielski
School of Computing Technologies, RMIT University, Australia
X
Xiaojun Chang
Department of Electronic Engineering and Information Science, University of Science and Technology of China