š¤ AI Summary
Large language models (LLMs) frequently exhibit insufficient role consistency when simulating human usersāmanifesting as persona deviation, contradictory statements, and behavioral incoherenceāthereby limiting their deployment in high-stakes interactive domains such as healthcare and education. To address this, we propose the first reinforcement learning (RL) framework explicitly designed for role consistency in multi-turn dialogue. Our method introduces and integrates three computationally tractable consistency metricsāprompt consistency, inter-sentence consistency, and question-answer consistencyāas fine-grained reward signals within a RLHF (Reinforcement Learning from Human Feedback) fine-tuning pipeline. Crucially, the framework operates without human annotation and enables end-to-end optimization. Empirical evaluation across three simulated user rolesāpatient, student, and social partnerādemonstrates a >55% reduction in inconsistency instances, yielding substantial improvements in dialogue coherence, behavioral stability, and persona fidelity.
š Abstract
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue. We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations. Using these metrics as reward signals, we apply multi-turn reinforcement learning to fine-tune LLMs for three user roles: a patient, a student, and a social chat partner. Our method reduces inconsistency by over 55%, resulting in more coherent and faithful simulated users.