Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the imbalance between decision-making efficiency and computational cost in large language models (LLMs) for complex reasoning tasks, this paper proposes a lightweight, plug-and-play Speculative Reward Model (SRM). SRM jointly optimizes search trajectories via external reward modeling and speculative action verification, and introduces a confidence-based subtree pruning mechanism to decouple performance optimization from cost reduction. Crucially, SRM requires no LLM fine-tuning, imposes no self-evaluation overhead on the LLM, and integrates seamlessly with mainstream search paradigms such as Tree-of-Thought. We introduce the first 3E evaluation principle (Efficiency, Effectiveness, Economy) and establish the first speculative reward framework whose reward computation is entirely independent of LLM involvement. Experiments across mathematical reasoning, planning, and domain-specific numerical reasoning demonstrate that SRM reduces average computational cost to one-tenth of baseline frameworks while preserving decision quality.

Technology Category

Application Category

📝 Abstract
Effective decision-making in Large Language Models (LLMs) is essential for handling intricate tasks. However, existing approaches prioritize performance but often overlook the balance between effectiveness and computational cost. To address this, we first introduce the 3E Criteria to systematically assess the cost-effectiveness of search strategies, revealing that existing methods often trade significant efficiency for marginal performance gains. To improve LLM decision-making while maintaining efficiency, we propose the Speculative Reward Model (SRM), a plug-and-play framework that seamlessly integrates with existing search strategies. Specifically, SRM employs an external reward assigner to predict optimal actions, reducing reliance on LLMs' internal self-evaluation. And a speculative verification mechanism is used to prune suboptimal choices and guide the search toward more promising steps. We evaluate SRM on several complex decision-making tasks including mathematical reasoning, planning and numerical reasoning in specialized domains. Experimental results show that SRM reduces costs to 1/10 of the original search framework on average while maintaining effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Balancing decision-making effectiveness and computational cost in LLMs
Introducing Speculative Reward Model to optimize action selection
Reducing search costs while maintaining task performance in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Speculative Reward Model (SRM) framework
Uses external reward assigner for action prediction
Employs speculative verification to prune suboptimal choices
🔎 Similar Papers
No similar papers found.