Bandit Learning in Matching Markets with Interviews

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses inefficiencies in two-sided matching markets arising from unknown preferences—particularly on the firm side—by modeling interviews as low-cost probes to elicit partial preference information and introducing a mechanism that allows firms to strategically defer hiring decisions. The work proposes the first matching algorithm that simultaneously handles preference uncertainty on the firm side by integrating multi-armed bandit learning, two-sided matching theory, and decentralized feedback. The algorithm achieves time-independent constant regret bounds under both decentralized and centralized settings, substantially improving upon the traditional $O(\log T)$ regret guarantees. Moreover, in structured markets, its decentralized performance closely approaches that of the centralized benchmark.

Technology Category

Application Category

📝 Abstract
Two-sided matching markets rely on preferences from both sides, yet it is often impractical to evaluate preferences. Participants, therefore, conduct a limited number of interviews, which provide early, noisy impressions and shape final decisions. We study bandit learning in matching markets with interviews, modeling interviews as \textit{low-cost hints} that reveal partial preference information to both sides. Our framework departs from existing work by allowing firm-side uncertainty: firms, like agents, may be unsure of their own preferences and can make early hiring mistakes by hiring less preferred agents. To handle this, we extend the firm's action space to allow \emph{strategic deferral} (choosing not to hire in a round), enabling recovery from suboptimal hires and supporting decentralized learning without coordination. We design novel algorithms for (i) a centralized setting with an omniscient interview allocator and (ii) decentralized settings with two types of firm-side feedback. Across all settings, our algorithms achieve time-independent regret, a substantial improvement over the $O(\log T)$ regret bounds known for learning stable matchings without interviews. Also, under mild structured markets, decentralized performance matches the centralized counterpart up to polynomial factors in the number of agents and firms.
Problem

Research questions and friction points this paper is trying to address.

matching markets
bandit learning
interviews
preference uncertainty
decentralized learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

bandit learning
matching markets
interviews
strategic deferral
decentralized learning