🤖 AI Summary
This work addresses the issue that maximizing match counts alone in matching platforms often leads to over-concentration on popular participants, undermining fairness and long-term utility. To mitigate this, the authors propose a Combinatorial Allocation Bandit (CAB) framework that, for the first time, optimizes for "arm satisfaction" as the primary objective. In each round, the framework allocates N users among K options and models nonlinear utility feedback via a generalized linear model. By integrating Upper Confidence Bound (UCB) and Thompson Sampling (TS) strategies, the approach enables online learning while promoting fair allocation. Theoretical analysis establishes a near-optimal regret upper bound, matching known lower bounds in specific settings. Synthetic experiments demonstrate that the proposed method consistently outperforms existing baselines in both fairness and overall performance.
📝 Abstract
A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular participants, which may increase dissatisfaction among other participants, such as companies, and ultimately lead to their churn, reducing the platform's profit opportunities. To address this issue, we propose a novel online learning problem, Combinatorial Allocation Bandits (CAB), which incorporates the notion of *arm satisfaction*. In CAB, at each round $t=1,\dots,T$, the learner observes $K$ feature vectors corresponding to $K$ arms for each of $N$ users, assigns each user to an arm, and then observes feedback following a generalized linear model (GLM). Unlike prior work, the learner's objective is not to maximize the number of positive feedback, but rather to maximize the arm satisfaction. For CAB, we provide an upper confidence bound algorithm that achieves an approximate regret upper bound, which matches the existing lower bound for the special case. Furthermore, we propose a TS algorithm and provide an approximate regret upper bound. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of the proposed algorithms compared to other methods.