Know What You Don't Know: Uncertainty Calibration of Process Reward Models

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing Process Reward Models (PRMs) suffer from severe miscalibration of uncertainty, systematically overestimating the success probability of reasoning trajectories—leading to inefficient resource allocation. This work introduces, for the first time, uncertainty calibration into PRMs, proposing a quantile regression–based calibration method and an Instance-Adaptive Scaling (IAS) framework: IAS dynamically adjusts the rollout depth of each reasoning trajectory based on its calibrated confidence score. The approach enables on-demand inference, substantially reducing computational cost while preserving answer accuracy. Experiments demonstrate that our method significantly outperforms baselines in calibration error metrics (e.g., Expected Calibration Error), achieving a 32% average reduction in inference cost on mathematical reasoning tasks without sacrificing accuracy. Our core contributions are: (1) the first uncertainty calibration paradigm specifically designed for PRMs; and (2) a learnable, instance-aware adaptive inference scheduling mechanism.

Technology Category

Application Category

📝 Abstract

Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present a calibration approach, performed via quantile regression, that adjusts PRM outputs to better align with true success probabilities. Leveraging these calibrated success estimates and their associated confidence bounds, we introduce an emph{instance-adaptive scaling} (IAS) framework that dynamically adjusts the inference budget based on the estimated likelihood that a partial reasoning trajectory will yield a correct final answer. Unlike conventional methods that allocate a fixed number of reasoning trajectories per query, this approach successfully adapts to each instance and reasoning step when using our calibrated PRMs. Experiments on mathematical reasoning benchmarks show that (i) our PRM calibration method successfully achieves small calibration error, outperforming the baseline methods, (ii) calibration is crucial for enabling effective adaptive scaling, and (iii) the proposed IAS strategy reduces inference costs while maintaining final answer accuracy, utilizing less compute on more confident problems as desired.

Problem

Research questions and friction points this paper is trying to address.

Calibrate process reward models to align with true success probabilities

Introduce instance-adaptive scaling to dynamically adjust inference budget

Reduce inference costs while maintaining answer accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantile regression for PRM calibration

Instance-adaptive scaling framework

Dynamic inference budget adjustment

🔎 Similar Papers

Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown