Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing computational efficiency and accuracy in large language models (LLMs) for complex reasoning tasks, this paper proposes a difficulty-aware dynamic model routing framework. Methodologically, it trains a lightweight difficulty predictor on intermediate representations extracted from the S1.1-32B model to perform fine-grained difficulty estimation for mathematical reasoning problems; inputs are then dynamically routed to appropriately sized smaller models based on predicted difficulty. This work is the first to leverage intermediate representations for difficulty modeling and dynamic model allocation in complex reasoning tasks. Experiments across multiple mathematical reasoning benchmarks demonstrate that the approach achieves accuracy comparable to—or even exceeding—that of S1.1-32B, while reducing total computational cost by up to 67% in FLOPs. It significantly outperforms both random and static model allocation baselines.

Technology Category

Application Category

📝 Abstract
Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.
Problem

Research questions and friction points this paper is trying to address.

Predicting problem difficulty to optimize model selection
Reducing computational costs of large reasoning language models
Routing problems to smallest capable model while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses difficulty prediction for routing problems
Assigns problems to smallest capable model
Reduces compute while maintaining accuracy
🔎 Similar Papers
No similar papers found.
B
Bo Zhao
UC San Diego
B
B. Kapusuzoglu
Capital One
K
Kartik Balasubramaniam
Capital One
Sambit Sahu
Sambit Sahu
Capital One
Generative AILLM Pre-trainingInference Optimization
S
Supriyo Chakraborty
Capital One
Genta Indra Winata
Genta Indra Winata
Capital One AI Foundations
MultilingualityLanguage ModelingMultimodalLow-resource NLPCode-Switching