Understanding Generalization in Role-Playing Models via Information Theory

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Role-playing models (RPMs) suffer significant generalization degradation in real-world scenarios, primarily due to three types of distributional shifts—user, role, and dialogue composition shifts. Existing evaluation methods (e.g., LLM-as-a-judge) lack fine-grained diagnostic capability and a formal theoretical framework for generalization analysis. Method: We propose R-EMID (Reasoning-enhanced Effective Mutual Information Difference), an interpretable information-theoretic metric that formally models generalization decay as conditional mutual information change. We derive its theoretical upper bound to quantify the marginal impact of each shift on worst-case generalization, and design a reinforcement learning framework that jointly optimizes user-role-dialogue co-evolution to improve response modeling fidelity. Contribution/Results: Empirical analysis reveals user distribution shift is most detrimental; R-EMID enables precise, fine-grained attribution of degradation sources; and our RL method achieves significantly superior generalization gains over mainstream baselines.

Technology Category

Application Category

📝 Abstract

Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, character, and dialogue compositional shifts. Existing methods like LLM-as-a-judge fall short in providing a fine-grained diagnosis of how these shifts affect RPM generalization, and thus there lack formal frameworks to characterize RPM generalization behaviors. To bridge these gaps, we introduce an information-theoretic metric, named reasoning-based effective mutual information difference (R-EMID), to measure RPM performance degradation in an interpretable way. We also derive an upper bound on R-EMID to predict the worst-case generalization performance of RPMs and theoretically reveal how various shifts contribute to the RPM performance degradation. Moreover, we propose a co-evolving reinforcement learning framework to adaptively model the connection among user, character, and dialogue context and thus enhance the estimation of dialogue response generation probability, which is critical for calculating R-EMID. Finally, we evaluate the generalization performance of various RPMs using R-EMID, finding that user shift poses the highest risk among all shifts and reinforcement learning is the most effective approach for enhancing RPM generalization.

Problem

Research questions and friction points this paper is trying to address.

Measuring RPM performance degradation due to distribution shifts

Predicting worst-case generalization using information-theoretic bounds

Enhancing response probability estimation via co-evolving reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces R-EMID metric for interpretable performance measurement

Proposes co-evolving reinforcement learning to model dialogue connections

Derives upper bound to predict worst-case generalization performance

🔎 Similar Papers

BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model