🤖 AI Summary
Fine-tuning vision foundation models often improves in-distribution accuracy but degrades robustness under distribution shifts. To address this, we propose Intermittent Stochastic Regularization (ISR), a fine-tuning strategy that dynamically preserves pretrained weights to enhance generalization. Our key contributions are: (1) a dynamic moving-average anchor mechanism that constructs an implicit weight-sharing ensemble; (2) explicit control over resampling frequency, revealing the synergistic impact of masked anchors, sparsity, and update cadence on robustness; and (3) a lightweight implementation based on improved Mixout—combining sparse kernels with exponential moving-average snapshots—that updates only a small subset of parameters with zero inference overhead. On benchmarks including ImageNet and DomainNet, ISR significantly improves out-of-distribution robustness while maintaining or even exceeding in-distribution accuracy, outperforming Model Soups and state-of-the-art parameter-efficient fine-tuning methods.
📝 Abstract
Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the emph{masking anchor}, emph{resampling frequency}, and emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.