🤖 AI Summary
This work addresses transfer learning under few-shot, unlabeled, and out-of-distribution (OOD) settings. We propose a pre-trained model checkpoint selection method grounded in **neural consistency**—a measure of statistical alignment between source and target domain activations across network layers. Leveraging only a small number of unlabeled target samples, our method automatically identifies the checkpoint with optimal generalization performance. Unlike existing approaches that rely on fine-tuning or pseudo-labeling, ours requires no target-domain annotations, incurs zero additional training cost, and is inherently robust to distributional shift. Evaluated on ImageNet1K-pretrained models, it significantly outperforms state-of-the-art checkpoint selection baselines on OOD benchmarks including Food-101, PlantNet-300K, and iNaturalist. Furthermore, we validate its broad utility in meta-learning initialization and hard example mining, demonstrating consistent generalization gains across diverse downstream tasks.
📝 Abstract
To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best determine which of the many possible model checkpoints resulting from a large training run to use as the starting point. This becomes especially important when data for the target task of interest is scarce, unlabeled and out-of-distribution. In such scenarios, common methods relying on in-distribution validation data become unreliable or inapplicable. This work proposes a novel approach for model selection that operates reliably on just a few unlabeled examples from the target task. Our approach is based on a novel concept: Neural Coherence, which entails characterizing a model's activation statistics for source and target domains, allowing one to define model selection methods with high data-efficiency. We provide experiments where models are pre-trained on ImageNet1K and examine target domains consisting of Food-101, PlantNet-300K and iNaturalist. We also evaluate it in many meta-learning settings. Our approach significantly improves generalization across these different target domains compared to established baselines. We further demonstrate the versatility of Neural Coherence as a powerful principle by showing its effectiveness in training data selection.