Improved Algorithms for Nash Welfare in Linear Bandits

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing linear bandit methods, which suffer from Nash regret bounds that scale with the ambient dimension \(d\) and lack a unified framework for balancing fairness and utility. The authors propose FairLinBandit, a general meta-algorithm compatible with any linear bandit strategy, designed to optimize Nash regret and extend naturally to the broader class of \(p\)-mean regret objectives. By introducing novel analytical tools, they establish the first dimension-free, optimal Nash regret bound and formally define and analyze \(p\)-mean regret in the linear bandit setting. Instantiating the framework with Phased Elimination and UCB—augmented with a new concentration inequality—the approach achieves sublinear \(p\)-mean regret across the full range of \(p\) values on real-world datasets, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
Nash regret has recently emerged as a principled fairness-aware performance metric for stochastic multi-armed bandits, motivated by the Nash Social Welfare objective. Although this notion has been extended to linear bandits, existing results suffer from suboptimality in ambient dimension $d$, stemming from proof techniques that rely on restrictive concentration inequalities. In this work, we resolve this open problem by introducing new analytical tools that yield an order-optimal Nash regret bound in linear bandits. Beyond Nash regret, we initiate the study of $p$-means regret in linear bandits, a unifying framework that interpolates between fairness and utility objectives and strictly generalizes Nash regret. We propose a generic algorithmic framework, FairLinBandit, that works as a meta-algorithm on top of any linear bandit strategy. We instantiate this framework using two bandit algorithms: Phased Elimination and Upper Confidence Bound, and prove that both achieve sublinear $p$-means regret for the entire range of $p$. Extensive experiments on linear bandit instances generated from real-world datasets demonstrate that our methods consistently outperform the existing state-of-the-art baseline.
Problem

Research questions and friction points this paper is trying to address.

Nash regret
linear bandits
fairness
p-means regret
stochastic multi-armed bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nash regret
linear bandits
p-means regret
FairLinBandit
fairness-aware learning
🔎 Similar Papers
No similar papers found.