Adaptive LLM Routing under Budget Constraints

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the dynamic LLM routing problem under budget constraints: how to adaptively select the most suitable large language model for incoming queries in realistic settings where optimal query-model pairings are unknown a priori and user queries evolve continuously. We propose PILOT, the first framework formulating LLM routing as a contextual bandit problem. PILOT integrates preference-prior-guided online learning, shared embedding-space alignment, offline human preference distillation, and online bandit feedback in a unified optimization objective. To enable resource-efficient allocation across heterogeneous budget constraints, we introduce a multi-choice knapsack mechanism. Leveraging an extended LinUCB algorithm, PILOT significantly reduces full-model inference overhead while improving routing accuracy and robustness across diverse budget regimes.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for each query/task. Previous approaches treat this as a supervised learning problem, assuming complete knowledge of optimal query-LLM pairings. However, real-world scenarios lack such comprehensive mappings and face evolving user queries. We thus propose to study LLM routing as a contextual bandit problem, enabling adaptive decision-making using bandit feedback without requiring exhaustive inference across all LLMs for all queries (in contrast to supervised routing). To address this problem, we develop a shared embedding space for queries and LLMs, where query and LLM embeddings are aligned to reflect their affinity. This space is initially learned from offline human preference data and refined through online bandit feedback. We instantiate this idea through Preference-prior Informed Linucb fOr adaptive rouTing (PILOT), a novel extension of LinUCB. To handle diverse user budgets for model routing, we introduce an online cost policy modeled as a multi-choice knapsack problem, ensuring resource-efficient routing.
Problem

Research questions and friction points this paper is trying to address.

Adaptive LLM routing under budget constraints
Dynamic LLM selection without exhaustive inference
Resource-efficient routing with online cost policy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual bandit problem for adaptive routing
Shared embedding space aligning queries and LLMs
Online cost policy via multi-choice knapsack
🔎 Similar Papers
No similar papers found.
Pranoy Panda
Pranoy Panda
ML Researcher, Fujitsu Research
Machine Learning
R
Raghav Magazine
Microsoft Research
Chaitanya Devaguptapu
Chaitanya Devaguptapu
IIT Hyderabad
Deep LearningComputer Vision
S
Sho Takemori
Fujitsu Research
V
Vishal Sharma
Microsoft