COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the contextual bandit problem in multi-strategic agent settings, where agents—motivated by self-interest—may misreport preferences or actions (e.g., sellers overstating product quality), violating the standard assumption of truthful reporting. To address this, we propose a novel monetary-free algorithm that integrates counterfactual reasoning, upper confidence bound (UCB) estimation, and dynamic utility-constrained modeling. Our approach is the first to simultaneously achieve incentive compatibility and a sublinear regret bound of $O(sqrt{T})$ without monetary transfers. We provide rigorous theoretical guarantees establishing both individual rationality and truthfulness. Empirical evaluation on a recommendation system simulation demonstrates substantial mitigation of strategic manipulation, yielding a 12.7% improvement in platform-wide revenue.

Technology Category

Application Category

📝 Abstract
This paper considers a contextual bandit problem involving multiple agents, where a learner sequentially observes the contexts and the agent's reported arms, and then selects the arm that maximizes the system's overall reward. Existing work in contextual bandits assumes that agents truthfully report their arms, which is unrealistic in many real-life applications. For instance, consider an online platform with multiple sellers; some sellers may misrepresent product quality to gain an advantage, such as having the platform preferentially recommend their products to online users. To address this challenge, we propose an algorithm, COBRA, for contextual bandit problems involving strategic agents that disincentivize their strategic behavior without using any monetary incentives, while having incentive compatibility and a sub-linear regret guarantee. Our experimental results also validate the different performance aspects of our proposed algorithm.
Problem

Research questions and friction points this paper is trying to address.

Ensuring truthful arm reports from strategic agents in contextual bandits
Addressing seller misrepresentation in online platform recommendations
Achieving incentive compatibility without monetary incentives in bandit algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual bandit algorithm for strategic agents
Ensures truthful reporting without monetary incentives
Guarantees incentive compatibility and sub-linear regret
🔎 Similar Papers
No similar papers found.
Arun Verma
Arun Verma
Singapore-MIT Alliance for Research and Technology
Sequential Decision MakingReinforcement LearningLarge Language Models
I
Indrajit Saha
Faculty of ISEE, Kyushu University, Japan
Makoto Yokoo
Makoto Yokoo
Kyushu University
Multiagent systemsalgorithmic game theorymarket designauction theorymatching
B
B. Low
Singapore-MIT Alliance for Research and Technology, Republic of Singapore; Department of Computer Science, National University of Singapore, Republic of Singapore