CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reinforcement learning approaches for role-playing tasks often suffer from distorted character traits or style collapse due to conflicts between task utility and role consistency. This work proposes Character-aware Relative Policy Optimization (CRPO), a novel framework that integrates character centrality into relative policy optimization for the first time. CRPO mitigates gradient conflicts by decoupling task and style rewards, employing a dynamic constraint mechanism, and adopting a sampling strategy that treats generic responses as a negative baseline. Experimental results demonstrate that CRPO significantly outperforms existing methods across multiple dimensions—including role consistency and emotional expressiveness—while effectively preventing the model from degenerating into a generic response distribution.
📝 Abstract
Recent advancements in Reinforcement Learning (RL), particularly Group Relative Policy Optimization (GRPO), have significantly enhanced the reasoning capabilities of Large Language Models. However, applying these problem-centric optimization methods to role-playing agents often leads to a loss of character fidelity and style collapse, as they prioritize context-specific utility over persona alignment. To address this, we propose Character-Centric Group Relative Policy Optimization (CRPO), a framework designed to realign RL objectives with the role-playing task. CRPO improves character distinctiveness through three mechanisms: decoupling task logic from stylistic rewards to resolve gradient conflicts, dynamically adapting optimization constraints based on character complexity, and utilizing generic responses as negative baselines to prevent the model from reverting to a common distribution. Extensive experiments demonstrate that CRPO outperforms existing methods in consistency, emotion and others.
Problem

Research questions and friction points this paper is trying to address.

role-playing agents
character fidelity
style collapse
persona alignment
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Character-centric RL
Role-playing Agents
Style Preservation
Gradient Decoupling
Negative Baseline