π€ AI Summary
Role-playing models (RPMs) suffer significant generalization degradation in real-world scenarios, primarily due to three types of distributional shiftsβuser, role, and dialogue composition shifts. Existing evaluation methods (e.g., LLM-as-a-judge) lack fine-grained diagnostic capability and a formal theoretical framework for generalization analysis.
Method: We propose R-EMID (Reasoning-enhanced Effective Mutual Information Difference), an interpretable information-theoretic metric that formally models generalization decay as conditional mutual information change. We derive its theoretical upper bound to quantify the marginal impact of each shift on worst-case generalization, and design a reinforcement learning framework that jointly optimizes user-role-dialogue co-evolution to improve response modeling fidelity.
Contribution/Results: Empirical analysis reveals user distribution shift is most detrimental; R-EMID enables precise, fine-grained attribution of degradation sources; and our RL method achieves significantly superior generalization gains over mainstream baselines.
π Abstract
Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, character, and dialogue compositional shifts. Existing methods like LLM-as-a-judge fall short in providing a fine-grained diagnosis of how these shifts affect RPM generalization, and thus there lack formal frameworks to characterize RPM generalization behaviors. To bridge these gaps, we introduce an information-theoretic metric, named reasoning-based effective mutual information difference (R-EMID), to measure RPM performance degradation in an interpretable way. We also derive an upper bound on R-EMID to predict the worst-case generalization performance of RPMs and theoretically reveal how various shifts contribute to the RPM performance degradation. Moreover, we propose a co-evolving reinforcement learning framework to adaptively model the connection among user, character, and dialogue context and thus enhance the estimation of dialogue response generation probability, which is critical for calculating R-EMID. Finally, we evaluate the generalization performance of various RPMs using R-EMID, finding that user shift poses the highest risk among all shifts and reinforcement learning is the most effective approach for enhancing RPM generalization.