🤖 AI Summary
This work addresses the critical challenge of evaluating model generalization in high-stakes scenarios with scarce labels, where existing methods lack reliable, label-free metrics for pre-deployment model selection and post-deployment performance monitoring. To bridge this gap, the study introduces, for the first time, the internal causal circuit mechanisms of Vision Transformers into generalization assessment, proposing two novel unsupervised metrics: Dependency Depth Bias and Circuit Shift Score. The former quantifies depth-wise biases in representational dependency structures, while the latter measures changes in circuit stability under distribution shifts. Extensive experiments across diverse tasks demonstrate that these metrics achieve substantially higher correlations with true generalization performance—improving by 13.4% and 34.1% on average over current approaches—thereby significantly enhancing the reliability of generalization prediction without requiring ground-truth labels.
📝 Abstract
Reliable generalization metrics are fundamental to the evaluation of machine learning models. Especially in high-stakes applications where labeled target data are scarce, evaluation of models' generalization performance under distribution shift is a pressing need. We focus on two practical scenarios: (1) Before deployment, how to select the best model for unlabeled target data? (2) After deployment, how to monitor model performance under distribution shift? The central need in both cases is a reliable and label-free proxy metric. Yet existing proxy metrics, such as model confidence or accuracy-on-the-line, are often unreliable as they only assess model output while ignoring the internal mechanisms that produce them. We address this limitation by introducing a new perspective: using the inner workings of a model, i.e., circuits, as a predictive metric of generalization performance. Leveraging circuit discovery, we extract the causal interactions between internal representations as a circuit, from which we derive two metrics tailored to the two practical scenarios. (1) Before deployment, we introduce Dependency Depth Bias, which measures different models' generalization capability on target data. (2) After deployment, we propose Circuit Shift Score, which predicts a model's generalization under different distribution shifts. Across various tasks, both metrics demonstrate significantly improved correlation with generalization performance, outperforming existing proxies by an average of 13.4\% and 34.1\%, respectively. Our code is available at https://github.com/deep-real/GenCircuit.