RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing reward models predominantly rely on point estimates, neglecting the epistemic uncertainty arising from limited human feedback, which hampers both alignment performance and labeling efficiency. This work proposes RewardUQ, a unified framework that systematically evaluates diverse uncertainty quantification methods for the first time. By integrating accuracy and calibration into a novel ranking strategy, RewardUQ supports active learning and mitigates reward over-optimization. Experimental results demonstrate that model scale and initialization significantly influence performance, and that most existing methods can be substantially improved with appropriate modifications. To foster community progress, the authors release an open-source, reproducible Python toolkit implementing the proposed framework.

Technology Category

Application Category

📝 Abstract

Reward models are central to aligning large language models (LLMs) with human preferences. Yet most approaches rely on pointwise reward estimates that overlook the epistemic uncertainty in reward models arising from limited human feedback. Recent work suggests that quantifying this uncertainty can reduce the costs of human annotation via uncertainty-guided active learning and mitigate reward overoptimization in LLM post-training. However, uncertainty-aware reward models have so far been adopted without thorough comparison, leaving them poorly understood. This work introduces a unified framework, RewardUQ, to systematically evaluate uncertainty quantification for reward models. We compare common methods along standard metrics measuring accuracy and calibration, and we propose a new ranking strategy incorporating both dimensions for a simplified comparison. Our experimental results suggest that model size and initialization have the most meaningful impact on performance, and most prior work could have benefited from alternative design choices. To foster the development and evaluation of new methods and aid the deployment in downstream applications, we release our open-source framework as a Python package. Our code is available at https://github.com/lasgroup/rewarduq.

Problem

Research questions and friction points this paper is trying to address.

reward models

epistemic uncertainty

uncertainty quantification

large language models

human feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty quantification

reward models

active learning