Bandits with Single-Peaked Preferences and Limited Resources

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the online stochastic matching problem under budget constraints: sequentially matching $U$ users to $K$ arms over $T$ rounds to maximize cumulative reward. Since optimal matching is NP-hard without structural assumptions, standard online learning approaches are infeasible. We introduce, for the first time, the single-peaked preference structure from social choice theory into a constrained bandit framework. To efficiently model user preferences, we propose a PQ-tree-based ordinal relation approximation method. Integrating the PQ-tree representation with the UCB framework, we design an algorithm that jointly performs offline matching and online learning updates. Theoretical analysis yields regret bounds of $ ilde{O}(UKT^{2/3})$ when preference structure is unknown, and improves to $ ilde{O}(Usqrt{TK})$ when known—substantially enhancing both computational efficiency and cumulative reward.

Technology Category

Application Category

📝 Abstract
We study an online stochastic matching problem in which an algorithm sequentially matches $U$ users to $K$ arms, aiming to maximize cumulative reward over $T$ rounds under budget constraints. Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible. To overcome this barrier, we focus on emph{single-peaked preferences} -- a well-established structure in social choice theory, where users' preferences are unimodal with respect to a common order over arms. We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $ ilde O(UKT^{2/3})$. Our approach relies on a novel PQ tree-based order approximation method. If the single-peaked structure is known, we develop an efficient UCB-like algorithm that achieves a regret bound of $ ilde O(Usqrt{TK})$.
Problem

Research questions and friction points this paper is trying to address.

Maximizing cumulative reward in online stochastic matching with budget constraints
Overcoming NP-hard complexity via single-peaked preference assumptions
Developing efficient algorithms for offline and online budgeted matching problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

PQ tree-based order approximation method
Efficient offline budgeted matching algorithm
UCB-like algorithm with single-peaked structure
🔎 Similar Papers
No similar papers found.
G
Gur Keinan
Technion—Israel Institute of Technology
R
Rotem Torkan
Technion—Israel Institute of Technology
Omer Ben-Porat
Omer Ben-Porat
Assistant Professor, Technion—Israel Institute of Technology
Economics and ComputationMulti-Agent SystemsMachine LearningData Science