LightRouter: Towards Efficient LLM Collaboration with Minimal Overhead

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the challenge of jointly optimizing performance, cost, and computational overhead in multi-LLM collaboration, this paper proposes a prior-free, low-overhead lightweight dynamic routing framework. Without requiring prior knowledge of model characteristics, the method employs an adaptive boot-token-aware selection mechanism to dynamically identify a small subset of low-cost black-box LLMs and integrates their outputs via a lightweight fusion strategy—entirely avoiding fine-tuning or access to internal model parameters. Its core innovation lies in introducing the “zero-prior–low-boot–black-box-collaboration” paradigm, eliminating reliance on expensive large models. Experiments across multiple benchmarks demonstrate up to a 25% improvement in accuracy and up to a 27% reduction in inference cost compared to state-of-the-art single large models, significantly enhancing cost-effectiveness and deployment flexibility.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large language models has unlocked remarkable capabilities across a diverse array of natural language processing tasks. However, the considerable differences among available LLMs-in terms of cost, performance, and computational demands-pose significant challenges for users aiming to identify the most suitable model for specific tasks. In this work, we present LightRouter, a novel framework designed to systematically select and integrate a small subset of LLMs from a larger pool, with the objective of jointly optimizing both task performance and cost efficiency. LightRouter leverages an adaptive selection mechanism to identify models that require only a minimal number of boot tokens, thereby reducing costs, and further employs an effective integration strategy to combine their outputs. Extensive experiments across multiple benchmarks demonstrate that LightRouter matches or outperforms widely-used ensemble baselines, achieving up to a 25% improvement in accuracy. Compared with leading high-performing models, LightRouter achieves comparable performance while reducing inference costs by up to 27%. Importantly, our framework operates without any prior knowledge of individual models and relies exclusively on inexpensive, lightweight models. This work introduces a practical approach for efficient LLM selection and provides valuable insights into optimal strategies for model combination.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal LLMs for tasks balancing cost and performance

Reducing inference costs while maintaining high accuracy in LLMs

Integrating multiple LLMs efficiently without prior model knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive selection mechanism for minimal boot tokens

Effective integration strategy for combined outputs

Lightweight models without prior knowledge requirement

🔎 Similar Papers

No similar papers found.