Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Conventional deep learning optimizers treat all training samples uniformly, rendering them ill-suited for scenarios involving data distribution shifts and group-level imbalances (e.g., class-wise disparities). Method: This paper proposes a novel distributionally robust optimization (DRO) framework tailored to modern deep learning practice. It is the first to tightly integrate DRO with adaptive stochastic optimizers, enabling semantic-group–aware sample reweighting (e.g., by class) and introducing an Adaptive Loss Scaling Optimization (ALSO) mechanism to efficiently optimize the grouped-weighted DRO objective. Contribution/Results: We provide rigorous convergence analysis under non-convex settings. Extensive experiments across diverse tasks—including tabular learning and semantic segmentation—demonstrate that our framework consistently outperforms standard optimizers (e.g., Adam) and existing DRO methods, achieving superior generalization and enhanced training stability.

Technology Category

Application Category

📝 Abstract

While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. Additionally, for practical applications, a method should allow weight assignment not only to individual samples, but also to groups of objects (for example, all samples of the same class). This paper aims to bridge this gap by introducing ALSO $unicode{x2013}$ Adaptive Loss Scaling Optimizer $unicode{x2013}$ an adaptive algorithm for a modified DRO objective that can handle weight assignment to sample groups. We prove the convergence of our proposed algorithm for non-convex objectives, which is the typical case for DL models. Empirical evaluation across diverse Deep Learning tasks, from Tabular DL to Split Learning tasks, demonstrates that ALSO outperforms both traditional optimizers and existing DRO methods.

Problem

Research questions and friction points this paper is trying to address.

Bridging gap between Distributionally Robust Optimization and deep learning practices

Enabling adaptive weight assignment to sample groups in optimization

Developing optimizer handling stochastic gradients for non-convex objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive algorithm for modified DRO objective

Handles weight assignment to sample groups

Proven convergence for non-convex DL objectives

🔎 Similar Papers

Empirical Tests of Optimization Assumptions in Deep Learning