🤖 AI Summary
In online fair allocation, real-world settings feature massive heterogeneous items with scarce copies (e.g., users interact with service providers only a few times), making utility estimation highly challenging.
Method: This paper introduces the first contextual bandit formulation for low-frequency, sparse-item allocation, proposing an online algorithm that jointly models context-dependent utilities via features and enforces fairness constraints—specifically, EF1 (envy-freeness up to one item) approximations—under irreversible, real-time decisions.
Contribution/Results: The algorithm achieves sublinear cumulative regret with a theoretical upper bound of $O(sqrt{T log T})$. Empirically, it significantly outperforms existing baselines in sparse-interaction regimes, attaining a superior trade-off between overall utility and fairness.
📝 Abstract
This paper considers a novel variant of the online fair division problem involving multiple agents in which a learner sequentially observes an indivisible item that has to be irrevocably allocated to one of the agents while satisfying a fairness and efficiency constraint. Existing algorithms assume a small number of items with a sufficiently large number of copies, which ensures a good utility estimation for all item-agent pairs from noisy bandit feedback. However, this assumption may not hold in many real-life applications, for example, an online platform that has a large number of users (items) who use the platform's service providers (agents) only a few times (a few copies of items), which makes it difficult to accurately estimate utilities for all item-agent pairs. To address this, we assume utility is an unknown function of item-agent features. We then propose algorithms that model online fair division as a contextual bandit problem, with sub-linear regret guarantees. Our experimental results further validate the effectiveness of the proposed algorithms.