🤖 AI Summary
To address the challenge of jointly optimizing deployment cost and human-preference-aligned response quality for large language models (LLMs) in edge-cloud collaborative settings, this paper proposes an uncertainty-aware dynamic routing framework. Methodologically, it introduces LLM-as-a-Judge for automated, scalable human preference modeling—a first in routing literature—and pioneers the use of uncertainty quantification—specifically Monte Carlo Dropout and ensemble variance—as core routing signals, replacing conventional static accuracy metrics or labor-intensive human annotations. A multi-objective optimization mechanism is further designed to balance cost-efficiency and response quality. Experiments on MT-Bench, GSM8K, and MMLU demonstrate substantial improvements over state-of-the-art routing methods: a 12.3% gain in human preference scores, a 19.7% increase in response quality consistency, and sustained high cost efficiency.
📝 Abstract
Deploying large language models (LLMs) in edge-cloud environments requires an efficient routing strategy to balance cost and response quality. Traditional approaches prioritize either human-preference data or accuracy metrics from benchmark datasets as routing criteria, but these methods suffer from rigidity and subjectivity. Moreover, existing routing frameworks primarily focus on accuracy and cost, neglecting response quality from a human preference perspective. In this work, we propose the Confidence-Driven LLM Router, a novel framework that leverages uncertainty estimation to optimize routing decisions. To comprehensively assess routing performance, we evaluate both system cost efficiency and response quality. In particular, we introduce the novel use of LLM-as-a-Judge to simulate human rating preferences, providing the first systematic assessment of response quality across different routing strategies. Extensive experiments on MT-Bench, GSM8K, and MMLU demonstrate that our approach outperforms state-of-the-art routing methods, achieving superior response quality while maintaining cost efficiency.