Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions

📅 2025-12-14

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study systematically investigates the persistent performance degradation of role-playing large language models (LLMs) in ultra-long dialogues (>100 turns), focusing on dynamic decay across three dimensions: role fidelity, instruction adherence, and safety. We introduce the first dialogue-conditioned long-horizon evaluation protocol, benchmarking seven prominent open- and closed-source models using long-context modeling and multi-dimensional dynamic quantitative metrics. Our analysis reveals, for the first time, a fundamental long-term trade-off between role fidelity and instruction adherence: all models exhibit significant erosion of role consistency as dialogue length increases—particularly in goal-directed scenarios—where responses progressively converge toward role-agnostic baselines, confirming a structural failure in long-term role persistence. These findings expose an intrinsic fragility in current role-playing paradigms and establish a reproducible benchmark with actionable insights for developing trustworthy, long-interaction role-aware LLMs.

Technology Category

Application Category

📝 Abstract

Persona-assigned large language models (LLMs) are used in domains such as education, healthcare, and sociodemographic simulation. Yet, they are typically evaluated only in short, single-round settings that do not reflect real-world usage. We introduce an evaluation protocol that combines long persona dialogues (over 100 rounds) and evaluation datasets to create dialogue-conditioned benchmarks that can robustly measure long-context effects. We then investigate the effects of dialogue length on persona fidelity, instruction-following, and safety of seven state-of-the-art open- and closed-weight LLMs. We find that persona fidelity degrades over the course of dialogues, especially in goal-oriented conversations, where models must sustain both persona fidelity and instruction following. We identify a trade-off between fidelity and instruction following, with non-persona baselines initially outperforming persona-assigned models; as dialogues progress and fidelity fades, persona responses become increasingly similar to baseline responses. Our findings highlight the fragility of persona applications in extended interactions and our work provides a protocol to systematically measure such failures.

Problem

Research questions and friction points this paper is trying to address.

Evaluates persona fidelity in long dialogues for LLMs

Measures trade-off between persona fidelity and instruction following

Assesses safety and performance degradation in extended interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Long persona dialogue evaluation protocol

Dialogue-conditioned benchmarks for long-context effects

Systematic measurement of persona fidelity degradation

🔎 Similar Papers

No similar papers found.