P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing personalized reward models often treat user context as static or implicit signals, struggling to capture the dynamic and multidimensional nature of human preferences. To address this limitation, this work proposes the P-Check framework, which introduces a learnable dynamic checklist mechanism for the first time. By employing a plug-and-play checklist generator, P-Check explicitly models the multidimensional criteria underlying personalized judgments and integrates a preference-aware contrastive weighting strategy to enhance reward discriminability. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches in both reward prediction accuracy and downstream personalized generation tasks, while also exhibiting superior robustness in out-of-distribution scenarios.

Technology Category

Application Category

📝 Abstract
Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approaches largely treat user context as a static or implicit conditioning signal, failing to capture the dynamic and multi-faceted nature of human judgment. In this paper, we propose P-Check, a novel personalized reward modeling framework, designed to train a plug-and-play checklist generator that synthesizes dynamic evaluation criteria for guiding the reward prediction. To better align these checklists with personalized nuances, we introduce Preference-Contrastive Criterion Weighting, a training strategy that assigns saliency scores to criteria based on their discriminative power for personalized judgment. We conduct extensive experiments and demonstrate that P-Check not only improves reward accuracy but also enhances downstream personalized generation, and remains robust in OOD scenarios.
Problem

Research questions and friction points this paper is trying to address.

personalized reward modeling
dynamic evaluation criteria
user context
human judgment
Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized reward modeling
dynamic checklist generation
Preference-Contrastive Criterion Weighting
plug-and-play framework
out-of-distribution robustness
🔎 Similar Papers
No similar papers found.