🤖 AI Summary
This study addresses the joint enhancement of model robustness against spurious correlations and network compressibility. We propose and empirically validate large learning rate (LLR) as a simple yet effective implicit regularization mechanism: during early training, LLR promotes feature invariance, enhances inter-class separation, and induces activation sparsity—thereby simultaneously improving both robustness and compressibility. Across diverse DNN architectures, optimizers, and spurious-correlation benchmarks (e.g., Waterbirds, CelebA), LLR consistently outperforms mainstream explicit regularizers (e.g., DropBlock, GroupNorm) and alternative hyperparameter tuning strategies. Theoretical analysis—complemented by representation dynamics experiments—reveals that LLR accelerates escape from sharp minima and steers optimization toward flat, structurally parsimonious regions of the loss landscape, thereby achieving dual gains in robustness and compressibility.
📝 Abstract
Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we position high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Importantly, our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning rates in standard classification tasks is likely due to its effect on addressing hidden/rare spurious correlations in the training dataset.