Improved Algorithms for Nash Welfare in Linear Bandits

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the limitations of existing linear bandit methods, which suffer from Nash regret bounds that scale with the ambient dimension $d$ and lack a unified framework for balancing fairness and utility. The authors propose FairLinBandit, a general meta-algorithm compatible with any linear bandit strategy, designed to optimize Nash regret and extend naturally to the broader class of $p$-mean regret objectives. By introducing novel analytical tools, they establish the first dimension-free, optimal Nash regret bound and formally define and analyze $p$-mean regret in the linear bandit setting. Instantiating the framework with Phased Elimination and UCB—augmented with a new concentration inequality—the approach achieves sublinear $p$-mean regret across the full range of $p$ values on real-world datasets, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Nash regret has recently emerged as a principled fairness-aware performance metric for stochastic multi-armed bandits, motivated by the Nash Social Welfare objective. Although this notion has been extended to linear bandits, existing results suffer from suboptimality in ambient dimension $d$, stemming from proof techniques that rely on restrictive concentration inequalities. In this work, we resolve this open problem by introducing new analytical tools that yield an order-optimal Nash regret bound in linear bandits. Beyond Nash regret, we initiate the study of $p$-means regret in linear bandits, a unifying framework that interpolates between fairness and utility objectives and strictly generalizes Nash regret. We propose a generic algorithmic framework, FairLinBandit, that works as a meta-algorithm on top of any linear bandit strategy. We instantiate this framework using two bandit algorithms: Phased Elimination and Upper Confidence Bound, and prove that both achieve sublinear $p$-means regret for the entire range of $p$. Extensive experiments on linear bandit instances generated from real-world datasets demonstrate that our methods consistently outperform the existing state-of-the-art baseline.

Problem

Research questions and friction points this paper is trying to address.

Nash regret

linear bandits

fairness

p-means regret

stochastic multi-armed bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nash regret

linear bandits

p-means regret