Truthful Reverse Auctions for Adaptive Selection via Contextual Multi-Armed Bandits

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the problem of selecting among multiple large language model (LLM) providers who bid to fulfill user queries, aiming to incentivize truthful bidding while ensuring cost efficiency and performance reliability. The authors formulate LLM selection as a contextual multi-armed bandit problem within a reverse auction framework and propose a novel resampling-based mechanism that, for the first time, integrates truthful mechanism design with contextual bandits. This approach guarantees truthfulness under any monotone allocation rule and enables query-aware adaptive provider selection. The method achieves sublinear regret and establishes a unified framework bridging mechanism design and online learning, significantly enhancing both the efficiency and reliability of LLM selection in dynamic environments.

Technology Category

Application Category

📝 Abstract

We study the problem of selecting large language models (LLMs) for user queries in settings where multiple LLM providers submit the cost of solving a query. From the users'perspective, choosing an optimal model is a sequential, query-dependent decision problem: high-capacity models offer more reliable outputs but are costlier, while lightweight models are faster and cheaper. We formalize this interaction as a reverse auction design problem with contextual online learning, where the user adaptively discovers which model performs best while eliciting costs from competing LLM providers. Existing multi-armed bandit (MAB) mechanisms focus on forward auctions and social welfare, leaving open the challenges of reverse auctions, provider-optimal outcomes, and contextual adaptation. We address these gaps by designing a resampling-based procedure that generalizes truthful forward MAB mechanisms to reverse auctions and prove that any monotone allocation rule with this procedure is truthful. Using this, we propose a contextual MAB algorithm that learns query-dependent model quality with sublinear regret. Our framework unifies mechanism design and adaptive learning, enabling efficient, truthful, and query-aware LLM selection.

Problem

Research questions and friction points this paper is trying to address.

reverse auction

large language models

contextual multi-armed bandits

truthful mechanism

adaptive selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

truthful reverse auction

contextual multi-armed bandits

mechanism design