Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Process Reward Models (PRMs) suffer from poor generalization in mathematical reasoning, particularly under out-of-distribution (OOD) conditions—namely, step-level OOD (arising from divergent reasoning paths generated by different models) and problem-level OOD (stemming from distributional shifts between training data and real-world problems). To address this, we propose a retrieval-augmented PRM framework centered on a novel two-stage semantic retrieval mechanism: (1) dense retrieval and semantic matching to identify relevant historical reasoning steps; and (2) dynamic fusion of retrieved evidence to calibrate PRM scoring. Our approach significantly improves cross-model and cross-task reasoning consistency and generalization. Extensive evaluation across multiple real-world mathematical reasoning benchmarks demonstrates consistent superiority over existing PRM baselines. We publicly release a curated retrieval-augmented dataset, training framework, and trained models, establishing a new state-of-the-art performance benchmark for PRMs.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies key OOD issues, including step OOD, caused by differences in reasoning patterns across model types and sizes, and question OOD, which arises from dataset shifts between training data and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps as a warmup, enhancing PRM's ability to evaluate target steps and improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetrievalPRM model, establishing a new standard for PRM performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses OOD challenges in Process Reward Models

Enhances generalization across model types and sizes

Improves reasoning consistency in mathematical problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Process Reward Model

Two-stage retrieval-enhanced mechanism

Generalizable mathematical reasoning

🔎 Similar Papers

Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models