CP-Router: An Uncertainty-Aware Router Between LLM and LRM

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) exhibit redundancy and inefficiency—and sometimes even degrade accuracy—when processing simple queries. Method: This paper proposes a training-free, model-agnostic dynamic routing framework that intelligently orchestrates LLMs and LRMs. Its core innovation is an uncertainty-aware routing mechanism that jointly leverages conformal prediction (CP) and adaptive full/binary entropy (FBE), establishing statistically grounded decision boundaries for model selection. The method integrates multi-choice question-answering (MCQA) prompting with a multi-model-compatible architecture, enabling threshold adaptation and cross-task generalization. Results: Evaluated on MCQA benchmarks spanning mathematics, logic, and Chinese chemistry, the framework significantly reduces token consumption while maintaining or improving accuracy. Further experiments on open-ended QA and heterogeneous model ensembles demonstrate strong robustness and broad applicability across diverse reasoning tasks and model families.

Technology Category

Application Category

📝 Abstract
Recent advances in Large Reasoning Models (LRMs) have significantly improved long-chain reasoning capabilities over Large Language Models (LLMs). However, LRMs often produce unnecessarily lengthy outputs even for simple queries, leading to inefficiencies or even accuracy degradation compared to LLMs. To overcome this, we propose CP-Router, a training-free and model-agnostic routing framework that dynamically selects between an LLM and an LRM, demonstrated with multiple-choice question answering (MCQA) prompts. The routing decision is guided by the prediction uncertainty estimates derived via Conformal Prediction (CP), which provides rigorous coverage guarantees. To further refine the uncertainty differentiation across inputs, we introduce Full and Binary Entropy (FBE), a novel entropy-based criterion that adaptively selects the appropriate CP threshold. Experiments across diverse MCQA benchmarks, including mathematics, logical reasoning, and Chinese chemistry, demonstrate that CP-Router efficiently reduces token usage while maintaining or even improving accuracy compared to using LRM alone. We also extend CP-Router to diverse model pairings and open-ended QA, where it continues to demonstrate strong performance, validating its generality and robustness.
Problem

Research questions and friction points this paper is trying to address.

Dynamic selection between LLM and LRM for efficiency
Uncertainty-aware routing using Conformal Prediction guarantees
Reducing token usage while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free model-agnostic LLM-LRM routing framework
Conformal Prediction for uncertainty-aware routing decisions
Full and Binary Entropy refines CP threshold selection
🔎 Similar Papers
No similar papers found.
Jiayuan Su
Jiayuan Su
Zhejiang University
LLMPost-TrainingReasoning
F
Fulin Lin
Zhejiang University
Z
Zhaopeng Feng
Zhejiang University
H
Han Zheng
Zhejiang University
T
Teng Wang
University of Hong Kong
Z
Zhenyu Xiao
Tsinghua University
X
Xinlong Zhao
Peking University
Zuozhu Liu
Zuozhu Liu
Assistant Professor, Zhejiang University/University of Illinois Urbana-Champaign
deep learningvision-language modelsmedical AI
Lu Cheng
Lu Cheng
Assistant Professor, UIC CS
Socially Responsible AICausal Machine LearningData MiningAI for Good
H
Hongwei Wang
Zhejiang University