Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing zero-shot evaluation metrics suffer from poor generalization and low correlation with actual model performance. This work proposes SWAP-Score, a training-free zero-shot evaluation method that leverages activation patterns from small batches of unlabeled data. SWAP-Score is the first unified metric applicable to both CNN and Transformer architectures and supports cross-modal assessment of model potential across vision and language tasks without labels. It achieves a Spearman correlation of 0.93 with CIFAR-10 accuracy in the DARTS search space and 0.71 on GLUE tasks using FlexiBERT. Furthermore, SWAP-NAS, a neural architecture search framework built upon SWAP-Score, efficiently completes architecture search in only 6–9 minutes of GPU time.

📝 Abstract

Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CNNs) or Transformers, but not both. To address these limitations, we propose Sample-Wise Activation Patterns (SWAP), and its derivative, SWAP-Score, a novel and highly effective zero-shot metric. SWAP-Score is broadly applicable across both architecture families and task domains, demonstrating strong predictive performance in the majority of tasks. This metric measures the expressivity of neural networks over a mini-batch of samples, showing a high correlation with the neural networks' ground-truth performance. For both CNNs and Transformers, the SWAP-Score outperforms existing zero-shot metrics across computer vision and natural language processing tasks. For instance, Spearman's correlation coefficient between the SWAP-Score and CIFAR-10 validation accuracy for DARTS CNNs is 0.93, and 0.71 for FlexiBERT Transformers on GLUE tasks. Moreover, SWAP-Score is label-independent, hence can be applied at the pre-training stage of language models to estimate their performance for downstream tasks. When applied to NAS, SWAP-empowered NAS, SWAP-NAS can achieve competitive performance using only approximately 6 and 9 minutes of GPU time, on CIFAR-10 and ImageNet respectively. Our code is available at: https://github.com/pym1024/SWAP_Universal

Problem

Research questions and friction points this paper is trying to address.

zero-shot evaluation

neural architecture search

generalization

cross-architecture

performance correlation

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot evaluation

neural architecture search

activation patterns