IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory

๐Ÿ“… 2025-06-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the query-to-model dynamic routing problem in multi-LLM settings, proposing an interpretable matching method that jointly optimizes performance and cost. Methodologically, it introduces Item Response Theory (IRT)โ€”a psychometric frameworkโ€”into LLM routing for the first time, modeling the intrinsic relationship between model capability and query difficulty to enable quantifiable, interpretable capability/difficulty assessment. It further designs a semantic-embedding-based online warm-start mechanism to mitigate cold-start issues. The approach comprises four components: IRT-based modeling, maximum-likelihood capability estimation, semantic similarity computation, and adaptive warm-up scheduling. Extensive experiments across 20 LLMs and 12 benchmark datasets demonstrate significant improvements over state-of-the-art baselines: up to 23.6% higher accuracy under cold-start conditions, while maintaining high efficiency, robustness, and strong interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) have demonstrated exceptional performance across a wide range of natural language tasks. However, selecting the optimal LLM to respond to a user query often necessitates a delicate balance between performance and cost. While powerful models deliver better results, they come at a high cost, whereas smaller models are more cost-effective but less capable. To address this trade-off, we propose IRT-Router, a multi-LLM routing framework that efficiently routes user queries to the most suitable LLM. Inspired by Item Response Theory (IRT), a psychological measurement methodology, IRT-Router explicitly models the relationship between LLM capabilities and user query attributes. This not only enables accurate prediction of response performance but also provides interpretable insights, such as LLM abilities and query difficulty. Additionally, we design an online query warm-up technique based on semantic similarity, further enhancing the online generalization capability of IRT-Router. Extensive experiments on 20 LLMs and 12 datasets demonstrate that IRT-Router outperforms most baseline methods in terms of effectiveness and interpretability. Its superior performance in cold-start scenarios further confirms the reliability and practicality of IRT-Router in real-world applications. Code is available at https://github.com/Mercidaiha/IRT-Router.
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and cost in LLM selection
Modeling LLM capabilities and query attributes
Enhancing routing accuracy and interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Item Response Theory for LLM routing
Models LLM capabilities and query attributes
Online query warm-up enhances generalization
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wei Song
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Zhenya Huang
Zhenya Huang
University of Science and Technology of China
Data ScienceAIKnowledge RepresentationCognitive ReasoningIntelligent Education
C
Cheng Cheng
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
W
Weibo Gao
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Bihan Xu
Bihan Xu
University of Science and Technology of China
G
GuanHao Zhao
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
F
Fei Wang
School of Computing, National University of Singapore
Runze Wu
Runze Wu
Fuxi AI Lab, NetEase Games | University of Science and Technology of China
Data MiningMachine LearningOnline Games