Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses the joint enhancement of model robustness against spurious correlations and network compressibility. We propose and empirically validate large learning rate (LLR) as a simple yet effective implicit regularization mechanism: during early training, LLR promotes feature invariance, enhances inter-class separation, and induces activation sparsity—thereby simultaneously improving both robustness and compressibility. Across diverse DNN architectures, optimizers, and spurious-correlation benchmarks (e.g., Waterbirds, CelebA), LLR consistently outperforms mainstream explicit regularizers (e.g., DropBlock, GroupNorm) and alternative hyperparameter tuning strategies. Theoretical analysis—complemented by representation dynamics experiments—reveals that LLR accelerates escape from sharp minima and steers optimization toward flat, structurally parsimonious regions of the loss landscape, thereby achieving dual gains in robustness and compressibility.

Technology Category

Application Category

📝 Abstract

Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.

Problem

Research questions and friction points this paper is trying to address.

Achieving robustness to spurious correlations with large learning rates

Enhancing network compressibility through high learning rates

Improving feature utilization and class separation via large learning rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large learning rates enhance robustness and compressibility

High learning rates improve invariant feature utilization

Large rates outperform other methods in consistency

🔎 Similar Papers

Efficient Deep Learning with Decorrelated Backpropagation