Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of dynamically routing queries among heterogeneous tool providers with varying latency, reliability, and answer quality in runtime environments lacking ground-truth labels. The authors propose LQM-ContextRoute, an adaptive routing method based on contextual bandits that innovatively models latency as a measure of service capacity rather than a penalty term. By introducing a latency–quality matching mechanism, the approach avoids the quality collapse commonly induced by traditional additive reward formulations. Integrating query-level quality estimation, LLM-as-judge feedback, and service-capacity-aware scoring, LQM-ContextRoute achieves consistent Pareto-optimal performance on the latency–quality frontier. Empirical results demonstrate a 2.18-point F1 improvement on web search benchmarks, up to an 18-point accuracy gain on StrategyQA, and 2.91–3.22-point NDCG gains across retriever pools.

📝 Abstract

Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.

Problem

Research questions and friction points this paper is trying to address.

tool routing

latency-quality tradeoff

functionally equivalent tools

LLM agents

contextual bandit

Innovation

Methods, ideas, or system contributions that make the work stand out.

latency-quality matching

contextual bandit routing

functionally equivalent tools