Optimal Algorithms for Bandit Learning in Matching Markets

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This paper studies pure exploration in matching markets under preference uncertainty: identifying a δ-confident stable matching with minimal sample complexity, given no prior knowledge and only noisy pairwise comparison feedback on agents’ preferences. The problem is formulated within a stochastic multi-armed bandit framework, encompassing both one-sided (only one side’s preferences unknown) and two-sided (both sides’ preferences unknown) learning settings. Theoretically, we establish the first information-theoretic lower bounds on sample complexity for both settings. Algorithmically, we propose asymptotically optimal strategies: a constant-factor-optimal policy for the one-sided case and a policy converging to the lower bound for the two-sided case. To support asymptotic analysis, we introduce an ordinary differential equation system characterizing the ideal learning trajectory. Experiments on synthetic labor-market platforms (e.g., Upwork) demonstrate that our algorithms closely approach the theoretical lower bounds, achieving both low sample complexity and high matching stability.

Technology Category

Application Category

📝 Abstract

We study the problem of pure exploration in matching markets under uncertain preferences, where the goal is to identify a stable matching with confidence parameter $δ$ and minimal sample complexity. Agents learn preferences via stochastic rewards, with expected values indicating preferences. This finds use in labor market platforms like Upwork, where firms and freelancers must be matched quickly despite noisy observations and no prior knowledge, in a stable manner that prevents dissatisfaction. We consider markets with unique stable matching and establish information-theoretic lower bounds on sample complexity for (1) one-sided learning, where one side of the market knows its true preferences, and (2) two-sided learning, where both sides are uncertain. We propose a computationally efficient algorithm and prove that it asymptotically ($δ o 0$) matches the lower bound to a constant for one-sided learning. Using the insights from the lower bound, we extend our algorithm to the two-sided learning setting and provide experimental results showing that it closely matches the lower bound on sample complexity. Finally, using a system of ODEs, we characterize the idealized fluid path that our algorithm chases.

Problem

Research questions and friction points this paper is trying to address.

Identifying stable matching with minimal sample complexity

Learning agent preferences via stochastic reward observations

Addressing one-sided and two-sided uncertain preference scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bandit learning algorithms for matching markets

Asymptotically optimal sample complexity bounds

Computationally efficient one-sided and two-sided learning

🔎 Similar Papers

Learning Optimal Stable Matches in Decentralized Markets with Unknown Preferences