CDRRM: Contrast-Driven Rubric Generation for Reliable and Interpretable Reward Modeling

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reward models often suffer from poor interpretability, reliance on costly human annotations, and vulnerability to redundancy, noise, and systematic biases in LLM-based evaluations—such as verbosity and positional preference—when using score-based criteria. To address these limitations, this work proposes CDRRM, a novel framework introducing a "contrastive-synthetic" paradigm. It leverages multidimensional contrastive analysis to extract causal discriminative factors and synthesizes concise, context-aware scoring criteria to guide preference judgments. This approach substantially enhances interpretability and data efficiency: with only 3k high-quality samples, a frozen discriminator within CDRRM surpasses fully fine-tuned baselines and achieves state-of-the-art performance across RewardBench, RMBench, and RMB benchmarks, effectively mitigating systematic biases and overcoming the trade-off between scalability and reliability.

Technology Category

Application Category

📝 Abstract
Reward modeling is essential for aligning Large Language Models(LLMs) with human preferences, yet conventional reward models suffer from poor interpretability and heavy reliance on costly expert annotations. While recent rubric-based approaches enhance evaluation transparency, they lack systematic quality control, yielding noisy and redundant criteria, failing to mitigate persistent biases (e.g., verbosity, position) in LLM evaluators, and creating a scalability-reliability trade-off. To address these limitations, we propose CDRRM (Contrast-Driven Rubric Reward Model), a framework built on a novel Contrast-then-Synthesis paradigm for high-quality rubric generation and guided preference judgment. CDRRM first conducts multi-dimensional contrastive profiling on preference pairs to identify causal discriminative factors, then synthesizes these insights into compact, context-aware rubrics to guide preference judg- ments. Extensive experiments on three authoritative benchmarks (RewardBench, RMBench, RMB) demonstrate that CDRRM achieves state-of-the-art performance across diverse domains and effectively mitigates aforementioned evaluation biases. Notably, our approach delivers exceptional data efficiency: training the rubric generator on only 3k high-quality samples empowers a frozen pre-trained judge model to outperform fully fine-tuned baselines. This work offers a scalable, interpretable, and data-efficient path for reward modeling.
Problem

Research questions and friction points this paper is trying to address.

reward modeling
rubric generation
interpretability
evaluation bias
LLM alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

contrast-driven rubric generation
reward modeling
interpretable AI
data-efficient learning
bias mitigation
🔎 Similar Papers
No similar papers found.