TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cascading failures in multi-step reasoning, where a single erroneous step can compromise the entire solution. Existing model routing approaches treat all reasoning steps uniformly, leading to suboptimal efficiency. To overcome this, we propose TRIM, the first method enabling dynamic, step-level routing: leveraging a process reward model to identify high-uncertainty, critical steps, TRIM selectively delegates only these error-prone steps to a large language model under a computational budget, while simpler steps are handled by a smaller, more efficient model. This strategy effectively interrupts error propagation. Experiments demonstrate that TRIM achieves up to 5× higher cost efficiency than prior methods on MATH-500, with advanced routing strategies matching baseline performance using 80% fewer large-model tokens. On challenging benchmarks like AIME, TRIM delivers up to 6× cost efficiency gains while maintaining strong generalization.

Technology Category

Application Category

📝 Abstract
Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.
Problem

Research questions and friction points this paper is trying to address.

multi-step reasoning
cascading failures
LLM routing
step-level difficulty
mathematical problem solving
Innovation

Methods, ideas, or system contributions that make the work stand out.

step-level routing
cascading error prevention
hybrid inference
process reward model
cost-efficient reasoning
🔎 Similar Papers
No similar papers found.