Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) suffer from weak persona consistency and insufficient fine-grained perception of emotions and character-specific traits in role-playing tasks. Moreover, prevailing human-alignment approaches—such as high-quality supervised annotation or reinforcement learning from human feedback (RLHF)—are costly and ill-suited to the inherent diversity of character behaviors. To address these challenges, we propose Persona-aware Contrastive Learning (PCL), an unsupervised, annotation-free framework that introduces a novel role-chain self-questioning and iterative contrastive learning paradigm for persona alignment. PCL jointly optimizes black-box and white-box LLMs to enable fine-grained persona modeling without human labels. Extensive evaluations—including CharEval benchmarking, GPT-4 automated assessment, and expert human evaluation—demonstrate that PCL significantly improves persona consistency and dialogue personalization, surpassing conventional supervised and RL-based paradigms.

Technology Category

Application Category

📝 Abstract
In recent years, large language models (LLMs) have achieved breakthrough progress in many dialogue generation tasks. However, their lack of emotion and fine-grained role awareness limits the model's ability to provide personalized and diverse interactions further. Current methods face high costs in collecting high-quality annotated data for scenarios such as role-playing, and traditional human alignment methods are difficult to deploy due to the inherent diversity of model behavior in role-playing scenarios. Inspired by the alignment of models for safety behaviors through RLHF (Reinforcement Learning from Human Feedback), in this paper, we revisit model role-playing behavior from the perspective of persona alignment and propose a novel annotation-free framework named extbf{underline{P}}ersona-Aware extbf{underline{C}}ontrastive extbf{underline{L}}earning (PCL) to align LLMs' behavior during role-playing, enhancing the model's role consistency. Specifically, we first design a role chain method to encourage the model to self-question based on the role characteristics and dialogue context to adjust personality consistency. Then, we further enhance the model's role-playing strategy through iterative contrastive learning between the use of role characteristics and not. Experiments on both black-box and white-box LLMs show that LLMs equipped with PCL significantly outperform vanilla LLMs under automatic evaluation methods (CharEval &GPT-4) and human expert evaluation.
Problem

Research questions and friction points this paper is trying to address.

Enhancing persona consistency in LLMs for role-playing
Reducing reliance on annotated data for role-playing scenarios
Aligning LLM behavior using persona-aware contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-Aware Contrastive Learning for alignment
Role chain method for self-questioning adjustment
Iterative contrastive learning enhances role-playing
🔎 Similar Papers
No similar papers found.