Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the open challenge of efficiently routing queries to large language models (LLMs) that balance quality and cost under practical constraints—namely, limited client-side compute resources, privacy requirements, decentralized data, and sparse local evaluations. We propose the first federated learning framework for LLM query routing, supporting both parametric (multilayer perceptron) and non-parametric (K-means) routers. The framework enables heterogeneous clients to collaboratively train a shared routing policy using only local offline evaluation data, without sharing raw inputs or model outputs. Our approach significantly outperforms locally trained baselines, effectively expanding model coverage and enhancing query generalization on two benchmarks. It achieves a superior trade-off between accuracy and inference cost, with theoretical guarantees that reduce routing suboptimality.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly accessed as remotely hosted services by edge and enterprise clients that cannot run frontier models locally. Since models vary widely in capability and price, routing queries to models that balance quality and inference cost is essential. Existing router approaches assume access to centralized query-model evaluation data. However, these data are often fragmented across clients, such as end users and organizations, and are privacy-sensitive, which makes centralizing data infeasible. Additionally, per-client router training is ineffective since local evaluation data is limited and covers only a restricted query distribution and a biased subset of model evaluations. We introduce the first federated framework for LLM routing, enabling clients to learn a shared routing policy from local offline query-model evaluation data. Our framework supports both parametric multilayer perceptron router and nonparametric K-means router under heterogeneous client query distributions and non-uniform model coverage. Across two benchmarks, federated collaboration improves the accuracy-cost frontier over client-local routers, both via increased effective model coverage and better query generalization. Our theoretical results also validate that federated training reduces routing suboptimality.

Problem

Research questions and friction points this paper is trying to address.

federated learning

language model routing

decentralized evaluation

query routing

privacy-sensitive data

Innovation

Methods, ideas, or system contributions that make the work stand out.

federated learning

LLM routing

decentralized evaluation

heterogeneous clients

query-model matching

🔎 Similar Papers

Federated Large Language Models: Current Progress and Future Directions

2024-09-24arXiv.orgCitations: 16

💼 Related Jobs

No related jobs found.

Authors to Follow