🤖 AI Summary
This paper studies the fairness-aware optimization problem in contextual multi-armed bandits, aiming to maximize cumulative reward while ensuring each arm receives a minimum aggregate reward across all contexts. To address limitations of existing methods—including the absence of closed-form optimal solutions and failure to adapt to dynamic constraints—we propose an “optimistic-pessimistic” cooperative framework: optimism guides exploration, while pessimism enforces fairness constraints. We derive problem-dependent upper bounds on both regret and constraint violation, and establish, for the first time, that the time-horizon dependence of these bounds is optimal in general settings. Furthermore, we identify fundamental limitations of unconstrained exploration principles. The framework provides a provably guaranteed paradigm for real-world applications requiring context-aware decision-making and fairness assurance, such as resource allocation and online advertising.
📝 Abstract
We examine a multi-armed bandit problem with contextual information, where the objective is to ensure that each arm receives a minimum aggregated reward across contexts while simultaneously maximizing the total cumulative reward. This framework captures a broad class of real-world applications where fair revenue allocation is critical and contextual variation is inherent. The cross-context aggregation of minimum reward constraints, while enabling better performance and easier feasibility, introduces significant technical challenges -- particularly the absence of closed-form optimal allocations typically available in standard MAB settings. We design and analyze algorithms that either optimistically prioritize performance or pessimistically enforce constraint satisfaction. For each algorithm, we derive problem-dependent upper bounds on both regret and constraint violations. Furthermore, we establish a lower bound demonstrating that the dependence on the time horizon in our results is optimal in general and revealing fundamental limitations of the free exploration principle leveraged in prior work.