RoleRMBench & RoleRM: Towards Reward Modeling for Profile-Based Role Play in Dialogue Systems

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reward models struggle to capture character consistency and fine-grained stylistic preferences in subjective, open-domain tasks such as role-playing, leading to substantial misalignment with human judgments. To address this, we introduce RoleRMBench—the first systematic, role-playing–oriented benchmark for reward modeling—and propose RoleRM, a novel reward model grounded in Continuous Implicit Preference (CIP). RoleRM converts discrete, subjective human evaluations into continuous, multi-strategy-consistent pairwise supervision, integrating structured annotations across three dimensions: character consistency, narrative coherence, and stylistic fidelity. Experiments demonstrate that RoleRM achieves an average improvement of over 24% over leading open-source and proprietary reward models on RoleRMBench. Moreover, human–model preference alignment is significantly enhanced, markedly narrowing the gap between model outputs and human evaluations.

Technology Category

Application Category

📝 Abstract
Reward modeling has become a cornerstone of aligning large language models (LLMs) with human preferences. Yet, when extended to subjective and open-ended domains such as role play, existing reward models exhibit severe degradation, struggling to capture nuanced and persona-grounded human judgments. To address this gap, we introduce RoleRMBench, the first systematic benchmark for reward modeling in role-playing dialogue, covering seven fine-grained capabilities from narrative management to role consistency and engagement. Evaluation on RoleRMBench reveals large and consistent gaps between general-purpose reward models and human judgment, particularly in narrative and stylistic dimensions. We further propose RoleRM, a reward model trained with Continuous Implicit Preferences (CIP), which reformulates subjective evaluation as continuous consistent pairwise supervision under multiple structuring strategies. Comprehensive experiments show that RoleRM surpasses strong open- and closed-source reward models by over 24% on average, demonstrating substantial gains in narrative coherence and stylistic fidelity. Our findings highlight the importance of continuous preference representation and annotation consistency, establishing a foundation for subjective alignment in human-centered dialogue systems.
Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for role-playing reward models
Addresses degradation of existing models in subjective domains
Proposes a model improving narrative and stylistic fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Implicit Preferences for subjective reward modeling
RoleRMBench benchmark for role-playing dialogue evaluation
Reformulating subjective evaluation as continuous pairwise supervision
🔎 Similar Papers
No similar papers found.
H
Hang Ding
Shanghai Jiao Tong University
Q
Qiming Feng
Fudan University
Dongqi Liu
Dongqi Liu
Saarland University
Computational LinguisticsNatural Language Processing
Q
Qi Zhao
Tencent Youtu Lab
Tao Yao
Tao Yao
Alibaba
Operations ResearchMachine LearningAnalyticsStatisticsTransportation
S
Shuo Wang
Tencent Youtu Lab
Dongsheng Chen
Dongsheng Chen
Technical University of Munich
GISSpatial analysisGeographyUrban Planning
J
Jian Li
Tencent Youtu Lab
Z
Zhenye Gan
Tencent Youtu Lab
J
Jiangning Zhang
Tencent Youtu Lab
C
Chengjie Wang
Tencent Youtu Lab
Y
Yabiao Wang
Tencent Youtu Lab