PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenge that current large language models struggle to maintain coherent and authentic persona-consistent role-playing in realistic social settings, while conventional evaluation methods rely on static setups that fail to capture the dynamic complexity of everyday interactions. To bridge this gap, the authors propose PersonaArena—a persona-level dynamic evaluation framework that constructs a fine-grained persona repository from filtered real-world social corpora, generates multi-turn, context-rich interactions within simulated environments, and incorporates a multi-agent debate mechanism for holistic automated assessment. Experimental results demonstrate that this framework significantly enhances model performance in both persona consistency and behavioral authenticity, offering a novel paradigm for developing socially adaptive AI agents.

📝 Abstract

Large language models (LLMs) increasingly serve as interactive social agents, yet their ability to maintain coherent and authentic persona-level role-playing remains limited, particularly in realistic social scenarios. Existing research predominantly focuses on character-level settings and relies on static evaluation formats, failing to capture the complexity of everyday social interactions. In this work, we present PersonaArena, a dynamic simulation framework for evaluating and improving persona-level role-playing in LLMs. PersonaArena leverages a large, filtered corpus of user-generated social content to construct a nuanced persona bank, and elicits multi-turn, context-rich interactions within simulated social environments. Our framework features a multi-agent debating judge for holistic and unbiased assessment. Through extensive experiments, we demonstrate that PersonaArena enables rigorous evaluation and enhancement of LLMs' role-playing capabilities, advancing the development of more authentic and socially adept AI agents.

Problem

Research questions and friction points this paper is trying to address.

persona-level role-playing

large language models

social interaction

dynamic simulation

authenticity

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic simulation

persona-level role-playing

multi-agent judging