Adaptive LLM Routing under Budget Constraints

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the dynamic LLM routing problem under budget constraints: how to adaptively select the most suitable large language model for incoming queries in realistic settings where optimal query-model pairings are unknown a priori and user queries evolve continuously. We propose PILOT, the first framework formulating LLM routing as a contextual bandit problem. PILOT integrates preference-prior-guided online learning, shared embedding-space alignment, offline human preference distillation, and online bandit feedback in a unified optimization objective. To enable resource-efficient allocation across heterogeneous budget constraints, we introduce a multi-choice knapsack mechanism. Leveraging an extended LinUCB algorithm, PILOT significantly reduces full-model inference overhead while improving routing accuracy and robustness across diverse budget regimes.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.

Problem

Research questions and friction points this paper is trying to address.

Adaptive LLM routing under budget constraints

Dynamic LLM selection without exhaustive inference

Resource-efficient routing with online cost policy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual bandit problem for adaptive routing

Shared embedding space aligning queries and LLMs

Online cost policy via multi-choice knapsack

🔎 Similar Papers

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing