🤖 AI Summary
Large language model (LLM) API providers may strategically manipulate service quality—e.g., through model degradation or response inflation—undermining service reliability and billing fairness for users.
Method: We propose the first incentive-compatible mechanism for multi-provider dynamic query delegation, operating in a continuous strategy space. Our approach integrates algorithmic game theory, mechanism design, and dynamic delegation modeling, yielding an approximately incentive-compatible algorithm with an additive approximation ratio of $O(T^{1-varepsilon}log T)$. We rigorously establish a lower bound on the asymptotic optimality of users’ quasilinear utility.
Results: End-to-end simulations on real-world API environments demonstrate that our mechanism significantly curbs provider manipulation, ensuring stable second-best service quality and robust utility guarantees for users.
📝 Abstract
The widespread adoption of Large Language Models (LLMs) through Application Programming Interfaces (APIs) induces a critical vulnerability: the potential for dishonest manipulation by service providers. This manipulation can manifest in various forms, such as secretly substituting a proclaimed high-performance model with a low-cost alternative, or inflating responses with meaningless tokens to increase billing. This work tackles the issue through the lens of algorithmic game theory and mechanism design. We are the first to propose a formal economic model for a realistic user-provider ecosystem, where a user can iteratively delegate $T$ queries to multiple model providers, and providers can engage in a range of strategic behaviors. As our central contribution, we prove that for a continuous strategy space and any $εin(0,frac12)$, there exists an approximate incentive-compatible mechanism with an additive approximation ratio of $O(T^{1-ε}log T)$, and a guaranteed quasi-linear second-best user utility. We also prove an impossibility result, stating that no mechanism can guarantee an expected user utility that is asymptotically better than our mechanism. Furthermore, we demonstrate the effectiveness of our mechanism in simulation experiments with real-world API settings.