Understanding Generalization in Role-Playing Models via Information Theory

πŸ“… 2025-12-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Role-playing models (RPMs) suffer significant generalization degradation in real-world scenarios, primarily due to three types of distributional shiftsβ€”user, role, and dialogue composition shifts. Existing evaluation methods (e.g., LLM-as-a-judge) lack fine-grained diagnostic capability and a formal theoretical framework for generalization analysis. Method: We propose R-EMID (Reasoning-enhanced Effective Mutual Information Difference), an interpretable information-theoretic metric that formally models generalization decay as conditional mutual information change. We derive its theoretical upper bound to quantify the marginal impact of each shift on worst-case generalization, and design a reinforcement learning framework that jointly optimizes user-role-dialogue co-evolution to improve response modeling fidelity. Contribution/Results: Empirical analysis reveals user distribution shift is most detrimental; R-EMID enables precise, fine-grained attribution of degradation sources; and our RL method achieves significantly superior generalization gains over mainstream baselines.

Technology Category

Application Category

πŸ“ Abstract
Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, character, and dialogue compositional shifts. Existing methods like LLM-as-a-judge fall short in providing a fine-grained diagnosis of how these shifts affect RPM generalization, and thus there lack formal frameworks to characterize RPM generalization behaviors. To bridge these gaps, we introduce an information-theoretic metric, named reasoning-based effective mutual information difference (R-EMID), to measure RPM performance degradation in an interpretable way. We also derive an upper bound on R-EMID to predict the worst-case generalization performance of RPMs and theoretically reveal how various shifts contribute to the RPM performance degradation. Moreover, we propose a co-evolving reinforcement learning framework to adaptively model the connection among user, character, and dialogue context and thus enhance the estimation of dialogue response generation probability, which is critical for calculating R-EMID. Finally, we evaluate the generalization performance of various RPMs using R-EMID, finding that user shift poses the highest risk among all shifts and reinforcement learning is the most effective approach for enhancing RPM generalization.
Problem

Research questions and friction points this paper is trying to address.

Measuring RPM performance degradation due to distribution shifts
Predicting worst-case generalization using information-theoretic bounds
Enhancing response probability estimation via co-evolving reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces R-EMID metric for interpretable performance measurement
Proposes co-evolving reinforcement learning to model dialogue connections
Derives upper bound to predict worst-case generalization performance
πŸ”Ž Similar Papers
No similar papers found.
Y
Yongqi Li
School of Computer Science, Wuhan University, Tongyi Lab
H
Hao Lang
Tongyi Lab
F
Fei Huang
Tongyi Lab
Tieyun Qian
Tieyun Qian
Wuhan University
natural language processingweb data mining
Y
Yongbin Li
Tongyi Lab