Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the challenge of dynamically routing queries among heterogeneous tool providers with varying latency, reliability, and answer quality in runtime environments lacking ground-truth labels. The authors propose LQM-ContextRoute, an adaptive routing method based on contextual bandits that innovatively models latency as a measure of service capacity rather than a penalty term. By introducing a latency–quality matching mechanism, the approach avoids the quality collapse commonly induced by traditional additive reward formulations. Integrating query-level quality estimation, LLM-as-judge feedback, and service-capacity-aware scoring, LQM-ContextRoute achieves consistent Pareto-optimal performance on the latency–quality frontier. Empirical results demonstrate a 2.18-point F1 improvement on web search benchmarks, up to an 18-point accuracy gain on StrategyQA, and 2.91–3.22-point NDCG gains across retriever pools.
📝 Abstract
Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.
Problem

Research questions and friction points this paper is trying to address.

tool routing
latency-quality tradeoff
functionally equivalent tools
LLM agents
contextual bandit
Innovation

Methods, ideas, or system contributions that make the work stand out.

latency-quality matching
contextual bandit routing
functionally equivalent tools
LLM-as-judge feedback
service capacity
🔎 Similar Papers