Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

📅 2023-12-13

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper studies a multi-armed bandit (MAB) problem where strategic agents serve as “arms”: each agent can manipulate reported rewards and incurred costs, necessitating an incentive-compatible mechanism that elicits high-performance truthful behavior while ensuring robust performance under non-equilibrium behavior (e.g., irrationality or deviations). To this end, we propose the first MAB framework jointly guaranteeing incentive compatibility and non-equilibrium robustness. We identify a key structural property enabling synergistic achievement of both objectives and integrate insights from second-price auctions to handle settings with no prior knowledge of arm qualities. Theoretically, our algorithm yields a non-vacuous lower bound on cumulative reward under arbitrary agent behavior; moreover, even without knowledge of true arm performances, it achieves an $O(sqrt{T})$ regret upper bound—substantially improving upon conventional approaches that either ignore incentives or lack robustness guarantees.

📝 Abstract

Motivated by applications such as online labor markets we consider a variant of the stochastic multi-armed bandit problem where we have a collection of arms representing strategic agents with different performance characteristics. The platform (principal) chooses an agent in each round to complete a task. Unlike the standard setting, when an arm is pulled it can modify its reward by absorbing it or improving it at the expense of a higher cost. The principle has to solve a mechanism design problem to incentivize the arms to give their best performance. However, since even with an effective mechanism agents may still deviate from rational behavior, the principal wants a robust algorithm that also gives a non-vacuous guarantee on the total accumulated rewards under non-equilibrium behavior. In this paper, we introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We do this by identifying a collection of intuitive properties that a bandit algorithm has to satisfy to achieve these objectives. Finally, we show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.

Problem

Research questions and friction points this paper is trying to address.

Incentivize strategic agents in multi-armed bandits to maximize performance

Design robust algorithms ensuring non-vacuous rewards under non-equilibrium behavior

Handle unknown arm performance using second-price auction-inspired methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Strategic agents modify rewards via cost

Mechanism design ensures best performance

Second price auctions handle unknown arms

🔎 Similar Papers

Multi-Player Approaches for Dueling Bandits