Revisiting Mixout: An Overlooked Path to Robust Finetuning

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Fine-tuning vision foundation models often improves in-distribution accuracy but degrades robustness under distribution shifts. To address this, we propose Intermittent Stochastic Regularization (ISR), a fine-tuning strategy that dynamically preserves pretrained weights to enhance generalization. Our key contributions are: (1) a dynamic moving-average anchor mechanism that constructs an implicit weight-sharing ensemble; (2) explicit control over resampling frequency, revealing the synergistic impact of masked anchors, sparsity, and update cadence on robustness; and (3) a lightweight implementation based on improved Mixout—combining sparse kernels with exponential moving-average snapshots—that updates only a small subset of parameters with zero inference overhead. On benchmarks including ImageNet and DomainNet, ISR significantly improves out-of-distribution robustness while maintaining or even exceeding in-distribution accuracy, outperforming Model Soups and state-of-the-art parameter-efficient fine-tuning methods.

Technology Category

Application Category

📝 Abstract

Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretrained reference, through the lens of a single-run, weight-sharing implicit ensemble. This perspective reveals three key levers that govern robustness: the emph{masking anchor}, emph{resampling frequency}, and emph{mask sparsity}. Guided by this analysis, we introduce GMixout, which (i) replaces the fixed anchor with an exponential moving-average snapshot that adapts during training, and (ii) regulates masking period via an explicit resampling-frequency hyperparameter. Our sparse-kernel implementation updates only a small fraction of parameters with no inference-time overhead, enabling training on consumer-grade GPUs. Experiments on benchmarks covering covariate shift, corruption, and class imbalance, ImageNet / ImageNet-LT, DomainNet, iWildCam, and CIFAR100-C, GMixout consistently improves in-domain accuracy beyond zero-shot performance while surpassing both Model Soups and strong parameter-efficient finetuning baselines under distribution shift.

Problem

Research questions and friction points this paper is trying to address.

Improving robustness under distribution shift during finetuning

Optimizing stochastic regularization via masking anchor and frequency

Enhancing accuracy and robustness with sparse parameter updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive exponential moving-average anchor replaces fixed anchor

Explicit hyperparameter regulates masking resampling frequency

Sparse-kernel updates few parameters without inference overhead

🔎 Similar Papers

An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning