Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional deep learning optimizers treat all training samples uniformly, rendering them ill-suited for scenarios involving data distribution shifts and group-level imbalances (e.g., class-wise disparities). Method: This paper proposes a novel distributionally robust optimization (DRO) framework tailored to modern deep learning practice. It is the first to tightly integrate DRO with adaptive stochastic optimizers, enabling semantic-group–aware sample reweighting (e.g., by class) and introducing an Adaptive Loss Scaling Optimization (ALSO) mechanism to efficiently optimize the grouped-weighted DRO objective. Contribution/Results: We provide rigorous convergence analysis under non-convex settings. Extensive experiments across diverse tasks—including tabular learning and semantic segmentation—demonstrate that our framework consistently outperforms standard optimizers (e.g., Adam) and existing DRO methods, achieving superior generalization and enhanced training stability.

Technology Category

Application Category

📝 Abstract
While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. Additionally, for practical applications, a method should allow weight assignment not only to individual samples, but also to groups of objects (for example, all samples of the same class). This paper aims to bridge this gap by introducing ALSO $unicode{x2013}$ Adaptive Loss Scaling Optimizer $unicode{x2013}$ an adaptive algorithm for a modified DRO objective that can handle weight assignment to sample groups. We prove the convergence of our proposed algorithm for non-convex objectives, which is the typical case for DL models. Empirical evaluation across diverse Deep Learning tasks, from Tabular DL to Split Learning tasks, demonstrates that ALSO outperforms both traditional optimizers and existing DRO methods.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap between Distributionally Robust Optimization and deep learning practices
Enabling adaptive weight assignment to sample groups in optimization
Developing optimizer handling stochastic gradients for non-convex objectives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive algorithm for modified DRO objective
Handles weight assignment to sample groups
Proven convergence for non-convex DL objectives
🔎 Similar Papers
No similar papers found.
D
Dmitrii Feoktistov
Lomonosov Moscow State University, Yandex Research, Institute for System Programming, RAS
I
Igor Ignashin
Institute for System Programming, RAS, Moscow Institute of Physics and Technology
Andrey Veprikov
Andrey Veprikov
Unknown affiliation
OptimizationMLDL
N
Nikita Borovko
Lomonosov Moscow State University
A
Alexander Bogdanov
Moscow Institute of Physics and Technology
Savelii Chezhegov
Savelii Chezhegov
Researcher
Aleksandr Beznosikov
Aleksandr Beznosikov
PhD, Basic Research of Artificial Intelligence Lab
OptimizationMachine Learning