Adaptive Bandit Algorithms for Contextual Matching Markets

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses bilateral matching markets under dynamically evolving contextual information, aiming to achieve stable and low-regret matchings as arms and agents arrive over time. The work proposes adaptive algorithms for both stochastic and adversarial context arrival models, introducing for the first time the notion of the minimal preference gap to characterize learning difficulty. Under the stochastic setting, it establishes an instance-dependent logarithmic regret upper bound alongside a tight instance-independent bound. In the adversarial setting, it defines a meaningful regret measure and guarantees sublinear regret. By integrating contextual multi-armed bandits, stable matching theory, and online learning, this work achieves both theoretical rigor and robustness to environmental uncertainty.
📝 Abstract
We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm matches them to players, aiming to minimize each player's regret against a stable matching benchmark. This contextual structure creates significant complexity: subtle context shifts can slightly alter one player's utility while completely reconfiguring the underlying benchmark, causing large regret spikes for others. We address this in two settings: stochastic contexts, drawn from a latent distribution, and adversarial contexts, which may be arbitrary. For the stochastic case, we introduce a novel minimum preference gap to capture learning difficulty and provide a fully adaptive algorithm with an instance-dependent poly-logarithmic regret upper bound. We also establish matching instance-independent regret upper and lower bounds under a mild distributional assumption. For the adversarial setting, we propose a tractable regret notion that remains valid under arbitrary contexts and achieves an instance-independent sublinear regret bound via an adaptive algorithm.
Problem

Research questions and friction points this paper is trying to address.

contextual bandits
matching markets
regret minimization
stable matching
adaptive algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual bandits
matching markets
adaptive algorithms
regret analysis
preference gap
🔎 Similar Papers