Beyond Minimax Rates in Group Distributionally Robust Optimization via a Novel Notion of Sparsity

📅 2024-10-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Group Distributionally Robust Optimization (GDRO) suffers from linear sample complexity in the number of groups $K$, hindering scalability. Method: We introduce the $(lambda, eta)$-sparsity assumption to characterize the structure where only $eta ll K$ risk-dominant groups govern overall robust risk. Leveraging a novel theoretical connection between GDRO and sleeping bandits, we design adaptive and semi-adaptive algorithms grounded in a two-player zero-sum game framework, per-action regret analysis, risk-aware group pruning, and adaptive weight updates. Contribution/Results: Our approach reduces sample complexity from $O(K)$ to $O(eta)$, achieving dimension-independent, computationally efficient optimization with significantly improved sample efficiency. Empirical validation on synthetic and real-world datasets confirms the plausibility of the sparsity assumption and the efficacy of our algorithms.

Technology Category

Application Category

📝 Abstract

The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a $log(K)$ factor, where $K$ is the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub $(lambda, eta)$-sparsity. In short, this condition means that at any parameter $ heta$, there is a set of at most $eta$ groups whose risks at $ heta$ all are at least $lambda$ larger than the risks of the other groups. To find an $epsilon$-optimal $ heta$, we show via a novel algorithm and analysis that the $epsilon$-dependent term in the sample complexity can swap a linear dependence on $K$ for a linear dependence on the potentially much smaller $eta$. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. We next show an adaptive algorithm which, up to log factors, gets a sample complexity bound that adapts to the best $(lambda, eta)$-sparsity condition that holds. We also show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method. Finally, we demonstrate the practicality of the $(lambda, eta)$-sparsity condition and the improved sample efficiency of our algorithms on both synthetic and real-life datasets.

Problem

Research questions and friction points this paper is trying to address.

Big Data

Robust Optimization

Sample Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Optimization

Robust Optimization

Adaptive Algorithms

🔎 Similar Papers

Drago: Primal-Dual Coupled Variance Reduction for Faster Distributionally Robust Optimization