🤖 AI Summary
Existing role-playing evaluations are largely confined to single-turn interactions, making it difficult to assess and improve agents’ role consistency and dialogue quality over extended multi-turn conversations. To address this limitation, this work proposes DynSess, the first end-to-end framework for conversation-level role-playing evaluation and optimization. It introduces a rule-based conversation-level evaluator, DynSess-Eval, which scores complete dialogues and generates reward signals to construct high-quality training trajectories. Leveraging these rewards, the framework trains a role-playing agent, DynSess-Character, using multi-turn lookahead search combined with two complementary policy optimization algorithms: off-policy DSPO and on-policy GSRPO. Experimental results demonstrate that DynSess-Eval scores align closely with human judgments, and DynSess-Character outperforms the strongest baselines—even with fewer parameters—achieving significant improvements in long-horizon role consistency and interactive capability.
📝 Abstract
Role-playing with large language models is fundamentally a session-level task, requiring agents to sustain character identity and interaction quality across extended multi-turn conversations. Yet existing evaluation and optimization methods remain largely turn-level, failing to capture long-horizon quality. We propose DynSess, a unified session-level framework for role-playing agents. DynSess-Eval scores complete dialogue sessions via rubrics targeting long-horizon behaviors. Leveraging its session-level rewards, we construct high-quality training trajectories through multi-turn lookahead search and train DynSess-Character with two complementary variants: DSPO (off-policy) and GSRPO (on-policy). Experiments show that DynSess-Eval aligns with human judgments substantially better than prior evaluators, and blind human evaluation further shows that DynSess-Character matches the strongest character model despite using substantially fewer parameters, while maintaining strong role consistency and interactive ability. Our dataset and code will be released to facilitate future research.