LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly optimizing performance, cost, and computational overhead in multi-LLM collaboration, this paper proposes a prior-free, low-overhead lightweight dynamic routing framework. Without requiring prior knowledge of model characteristics, the method employs an adaptive boot-token-aware selection mechanism to dynamically identify a small subset of low-cost black-box LLMs and integrates their outputs via a lightweight fusion strategy—entirely avoiding fine-tuning or access to internal model parameters. Its core innovation lies in introducing the “zero-prior–low-boot–black-box-collaboration” paradigm, eliminating reliance on expensive large models. Experiments across multiple benchmarks demonstrate up to a 25% improvement in accuracy and up to a 27% reduction in inference cost compared to state-of-the-art single large models, significantly enhancing cost-effectiveness and deployment flexibility.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models has unlocked remarkable capabilities across a diverse array of natural language processing tasks. However, the considerable differences among available LLMs-in terms of cost, performance, and computational demands-pose significant challenges for users aiming to identify the most suitable model for specific tasks. In this work, we present LightRouter, a novel framework designed to systematically select and integrate a small subset of LLMs from a larger pool, with the objective of jointly optimizing both task performance and cost efficiency. LightRouter leverages an adaptive selection mechanism to identify models that require only a minimal number of boot tokens, thereby reducing costs, and further employs an effective integration strategy to combine their outputs. Extensive experiments across multiple benchmarks demonstrate that LightRouter matches or outperforms widely-used ensemble baselines, achieving up to a 25% improvement in accuracy. Compared with leading high-performing models, LightRouter achieves comparable performance while reducing inference costs by up to 27%. Importantly, our framework operates without any prior knowledge of individual models and relies exclusively on inexpensive, lightweight models. This work introduces a practical approach for efficient LLM selection and provides valuable insights into optimal strategies for model combination.
Problem

Research questions and friction points this paper is trying to address.

Selecting optimal LLMs for tasks balancing cost and performance
Reducing inference costs while maintaining high accuracy in LLMs
Integrating multiple LLMs efficiently without prior model knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive selection mechanism for minimal boot tokens
Effective integration strategy for combined outputs
Lightweight models without prior knowledge requirement
🔎 Similar Papers
No similar papers found.
Y
Yifan Zhang
School of Software Technology, Zhejiang University, Hangzhou, China
X
Xinkui Zhao
School of Software Technology, Zhejiang University, Hangzhou, China
Z
Zuxin Wang
School of Software Technology, Zhejiang University, Hangzhou, China
Guanjie Cheng
Guanjie Cheng
Assistant Professor, School of Software Technology, Zhejiang University
AIoTMuti-Agent CollaborationEdge ComputingData Security and BlockchainPrivacy Protection
Yueshen Xu
Yueshen Xu
Xidian University; Zhejiang University; UIC
Service ComputingSoftware EngineeringSoftware Service EngineeringEdge Computing
S
Shuiguang Deng
School of Computer Science, Zhejiang University, Hangzhou, China
Jianwei Yin
Jianwei Yin
Professor of Computer Science and Technology, Zhejiang University
Service ComputingComputer ArchitectureDistributed ComputingAI