🤖 AI Summary
Conventional deep learning optimizers treat all training samples uniformly, rendering them ill-suited for scenarios involving data distribution shifts and group-level imbalances (e.g., class-wise disparities).
Method: This paper proposes a novel distributionally robust optimization (DRO) framework tailored to modern deep learning practice. It is the first to tightly integrate DRO with adaptive stochastic optimizers, enabling semantic-group–aware sample reweighting (e.g., by class) and introducing an Adaptive Loss Scaling Optimization (ALSO) mechanism to efficiently optimize the grouped-weighted DRO objective.
Contribution/Results: We provide rigorous convergence analysis under non-convex settings. Extensive experiments across diverse tasks—including tabular learning and semantic segmentation—demonstrate that our framework consistently outperforms standard optimizers (e.g., Adam) and existing DRO methods, achieving superior generalization and enhanced training stability.
📝 Abstract
While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. Additionally, for practical applications, a method should allow weight assignment not only to individual samples, but also to groups of objects (for example, all samples of the same class). This paper aims to bridge this gap by introducing ALSO $unicode{x2013}$ Adaptive Loss Scaling Optimizer $unicode{x2013}$ an adaptive algorithm for a modified DRO objective that can handle weight assignment to sample groups. We prove the convergence of our proposed algorithm for non-convex objectives, which is the typical case for DL models. Empirical evaluation across diverse Deep Learning tasks, from Tabular DL to Split Learning tasks, demonstrates that ALSO outperforms both traditional optimizers and existing DRO methods.