INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Routing queries across large-scale, specialized large language model (LLM) pools suffers from poor scalability and limited adaptability to model growth and capability evolution. Method: We propose a structured “capability–knowledge” dual-dimensional modeling framework to construct fine-grained model profiles, enabling query-driven dynamic routing. Our approach leverages the RouteMix dataset for multi-dimensional feature modeling, introduces a lightweight inference-time routing mechanism, and establishes a unified evaluation protocol across MMLU-Pro, GPQA, BigGenBench, and LiveBench. Contribution/Results: Experiments demonstrate significant improvements in task accuracy across multiple authoritative benchmarks while reducing average inference overhead. Notably, we provide the first empirical validation of group-level routing efficacy. The proposed framework delivers a scalable, adaptive, and plug-and-play intelligent scheduling infrastructure for LLM ecosystems.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) routing is a pivotal technique for navigating a diverse landscape of LLMs, aiming to select the best-performing LLMs tailored to the domains of user queries, while managing computational resources. However, current routing approaches often face limitations in scalability when dealing with a large pool of specialized LLMs, or in their adaptability to extending model scope and evolving capability domains. To overcome those challenges, we propose InferenceDynamics, a flexible and scalable multi-dimensional routing framework by modeling the capability and knowledge of models. We operate it on our comprehensive dataset RouteMix, and demonstrate its effectiveness and generalizability in group-level routing using modern benchmarks including MMLU-Pro, GPQA, BigGenBench, and LiveBench, showcasing its ability to identify and leverage top-performing models for given tasks, leading to superior outcomes with efficient resource utilization. The broader adoption of Inference Dynamics can empower users to harness the full specialized potential of the LLM ecosystem, and our code will be made publicly available to encourage further research.
Problem

Research questions and friction points this paper is trying to address.

Efficiently routing user queries to best-performing LLMs
Scaling routing for large pools of specialized LLMs
Adapting to evolving model capabilities and domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured capability and knowledge profiling
Flexible multi-dimensional routing framework
Effective group-level routing with benchmarks
🔎 Similar Papers
No similar papers found.