OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reward models predominantly rely on scalar scores or pairwise preferences, failing to capture the multidimensional nature of human preferences. Although structured natural language rubrics have been introduced to address this limitation, their automated generation remains hampered by reliability and scalability bottlenecks. To overcome these challenges, we propose the Contrastive Rubric Generation (CRG) framework, which jointly incorporates explicit constraints and implicit qualitative modeling, enhanced by rejection sampling to improve rubric consistency. Leveraging CRG, we develop Rubric-RM—a reward model integrating contrastive learning, synthetically generated preference data, and preference-consistency verification. Evaluated across multiple benchmarks, Rubric-RM outperforms same-scale baselines by 6.8%, yielding substantial improvements in instruction-following and biomedical reasoning tasks. Our approach enables fine-grained, interpretable, and scalable multidimensional reward modeling.

Technology Category

Application Category

📝 Abstract
Reward modeling lies at the core of reinforcement learning from human feedback (RLHF), yet most existing reward models rely on scalar or pairwise judgments that fail to capture the multifaceted nature of human preferences. Recent studies have explored rubrics-as-rewards (RaR) that uses structured natural language criteria that capture multiple dimensions of response quality. However, producing rubrics that are both reliable and scalable remains a key challenge. In this work, we introduce OpenRubrics, a diverse, large-scale collection of (prompt, rubric) pairs for training rubric-generation and rubric-based reward models. To elicit discriminative and comprehensive evaluation signals, we introduce Contrastive Rubric Generation (CRG), which derives both hard rules (explicit constraints) and principles (implicit qualities) by contrasting preferred and rejected responses. We further improve reliability by enforcing preference-label consistency via rejection sampling to remove noisy rubrics. Across multiple reward-modeling benchmarks, our rubric-based reward model, Rubric-RM, surpasses strong size-matched baselines by 6.8%. These gains transfer to policy models on instruction-following and biomedical benchmarks. Our results show that rubrics provide scalable alignment signals that narrow the gap between costly human evaluation and automated reward modeling, enabling a new principle-driven paradigm for LLM alignment.
Problem

Research questions and friction points this paper is trying to address.

Generating reliable and scalable rubrics for reward modeling remains challenging
Existing reward models fail to capture multifaceted human preferences adequately
Current methods lack structured evaluation criteria for LLM alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates rubrics via contrastive analysis of responses
Enhances reliability through preference-consistency filtering
Uses structured rubrics for scalable reward modeling
🔎 Similar Papers
T
Tianci Liu
Purdue University
R
Ran Xu
Emory University
T
Tony Yu
Georgia Institute of Technology
Ilgee Hong
Ilgee Hong
Georgia Institute of Technology
Machine LearningLarge Language Models
Carl Yang
Carl Yang
Waymo LLC, PhD at University of California, Davis
GPU ComputingParallel ComputingGraph Processing
T
Tuo Zhao
Georgia Institute of Technology
H
Haoyu Wang
University at Albany