Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the joint enhancement of model robustness against spurious correlations and network compressibility. We propose and empirically validate large learning rate (LLR) as a simple yet effective implicit regularization mechanism: during early training, LLR promotes feature invariance, enhances inter-class separation, and induces activation sparsity—thereby simultaneously improving both robustness and compressibility. Across diverse DNN architectures, optimizers, and spurious-correlation benchmarks (e.g., Waterbirds, CelebA), LLR consistently outperforms mainstream explicit regularizers (e.g., DropBlock, GroupNorm) and alternative hyperparameter tuning strategies. Theoretical analysis—complemented by representation dynamics experiments—reveals that LLR accelerates escape from sharp minima and steers optimization toward flat, structurally parsimonious regions of the loss landscape, thereby achieving dual gains in robustness and compressibility.

Technology Category

Application Category

📝 Abstract
Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.
Problem

Research questions and friction points this paper is trying to address.

Achieving robustness to spurious correlations with large learning rates
Enhancing network compressibility through high learning rates
Improving feature utilization and class separation via large learning rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large learning rates enhance robustness and compressibility
High learning rates improve invariant feature utilization
Large rates outperform other methods in consistency
🔎 Similar Papers
No similar papers found.