🤖 AI Summary
Group Distributionally Robust Optimization (GDRO) suffers from linear sample complexity in the number of groups $K$, hindering scalability. Method: We introduce the $(lambda, eta)$-sparsity assumption to characterize the structure where only $eta ll K$ risk-dominant groups govern overall robust risk. Leveraging a novel theoretical connection between GDRO and sleeping bandits, we design adaptive and semi-adaptive algorithms grounded in a two-player zero-sum game framework, per-action regret analysis, risk-aware group pruning, and adaptive weight updates. Contribution/Results: Our approach reduces sample complexity from $O(K)$ to $O(eta)$, achieving dimension-independent, computationally efficient optimization with significantly improved sample efficiency. Empirical validation on synthetic and real-world datasets confirms the plausibility of the sparsity assumption and the efficacy of our algorithms.
📝 Abstract
The minimax sample complexity of group distributionally robust optimization (GDRO) has been determined up to a $log(K)$ factor, where $K$ is the number of groups. In this work, we venture beyond the minimax perspective via a novel notion of sparsity that we dub $(lambda, eta)$-sparsity. In short, this condition means that at any parameter $ heta$, there is a set of at most $eta$ groups whose risks at $ heta$ all are at least $lambda$ larger than the risks of the other groups. To find an $epsilon$-optimal $ heta$, we show via a novel algorithm and analysis that the $epsilon$-dependent term in the sample complexity can swap a linear dependence on $K$ for a linear dependence on the potentially much smaller $eta$. This improvement leverages recent progress in sleeping bandits, showing a fundamental connection between the two-player zero-sum game optimization framework for GDRO and per-action regret bounds in sleeping bandits. We next show an adaptive algorithm which, up to log factors, gets a sample complexity bound that adapts to the best $(lambda, eta)$-sparsity condition that holds. We also show how to get a dimension-free semi-adaptive sample complexity bound with a computationally efficient method. Finally, we demonstrate the practicality of the $(lambda, eta)$-sparsity condition and the improved sample efficiency of our algorithms on both synthetic and real-life datasets.