Leveraging Uncertainty Estimation for Efficient LLM Routing

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the challenge of jointly optimizing deployment cost and human-preference-aligned response quality for large language models (LLMs) in edge-cloud collaborative settings, this paper proposes an uncertainty-aware dynamic routing framework. Methodologically, it introduces LLM-as-a-Judge for automated, scalable human preference modeling—a first in routing literature—and pioneers the use of uncertainty quantification—specifically Monte Carlo Dropout and ensemble variance—as core routing signals, replacing conventional static accuracy metrics or labor-intensive human annotations. A multi-objective optimization mechanism is further designed to balance cost-efficiency and response quality. Experiments on MT-Bench, GSM8K, and MMLU demonstrate substantial improvements over state-of-the-art routing methods: a 12.3% gain in human preference scores, a 19.7% increase in response quality consistency, and sustained high cost efficiency.

Technology Category

Application Category

📝 Abstract

Deploying large language models (LLMs) in edge-cloud environments requires an efficient routing strategy to balance cost and response quality. Traditional approaches prioritize either human-preference data or accuracy metrics from benchmark datasets as routing criteria, but these methods suffer from rigidity and subjectivity. Moreover, existing routing frameworks primarily focus on accuracy and cost, neglecting response quality from a human preference perspective. In this work, we propose the Confidence-Driven LLM Router, a novel framework that leverages uncertainty estimation to optimize routing decisions. To comprehensively assess routing performance, we evaluate both system cost efficiency and response quality. In particular, we introduce the novel use of LLM-as-a-Judge to simulate human rating preferences, providing the first systematic assessment of response quality across different routing strategies. Extensive experiments on MT-Bench, GSM8K, and MMLU demonstrate that our approach outperforms state-of-the-art routing methods, achieving superior response quality while maintaining cost efficiency.

Problem

Research questions and friction points this paper is trying to address.

Efficient routing for LLMs

Balancing cost and response quality

Incorporating human preference in routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty estimation for routing

LLM-as-a-Judge for human preference

System cost and quality evaluation

🔎 Similar Papers

Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Load Balancing

2024-08-24Citations: 0

Bosch Group

bangalore, IN

Authors to Follow