Efficient Individually Rational Recommender System under Stochastic Order

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the exploration-exploitation trade-off in multi-agent recommendation systems under individual rationality constraints: each user’s expected utility must be at least as high as that of their default option, while the system maximizes cumulative social utility. For reward distributions satisfying the stochastic order condition (e.g., Bernoulli, unit-variance Gaussian), we propose an approximately optimal algorithm based on an auxiliary-objective Markov decision process (MDP), achieving efficient cooperative exploration under strict individual rationality for the first time. We further design an incentive-compatible variant ensuring agents truthfully report preferences. Theoretically and empirically, our method achieves near-optimal cumulative utility under canonical stochastic orders, strictly satisfies individual rationality, and significantly improves overall system welfare.

Technology Category

Application Category

📝 Abstract
With the rise of online applications, recommender systems (RSs) often encounter constraints in balancing exploration and exploitation. Such constraints arise when exploration is carried out by agents whose individual utility should be balanced with overall welfare. Recent work suggests that recommendations should be individually rational. Specifically, if agents have a default arm they would use, relying on the RS should yield each agent at least the reward of the default arm, conditioned on the knowledge available to the RS. Under this individual rationality constraint, striking a balance between exploration and exploitation becomes a complex planning problem. We assume a stochastic order of the rewards (e.g., Bernoulli, unit-variance Gaussian, etc.), and derive an approximately optimal algorithm. Our technique is based on an auxiliary Goal Markov Decision Process problem that is of independent interest. Additionally, we present an incentive-compatible version of our algorithm.
Problem

Research questions and friction points this paper is trying to address.

Balancing exploration and exploitation in recommender systems
Ensuring individual rationality in agent recommendations
Developing optimal algorithms under stochastic reward orders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Individually rational recommender system
Stochastic order reward assumption
Goal Markov Decision Process technique
🔎 Similar Papers
No similar papers found.