🤖 AI Summary
This paper addresses the “interaction hallucination” problem in large language models (LLMs) during role-playing social interactions—i.e., the generation of factually inconsistent or role-incongruent content due to stance drift. We propose the first explicit, generalizable evaluation paradigm for this issue. We formally define interaction hallucination and introduce SHARP, a benchmark jointly driven by commonsense knowledge graphs and role stance modeling. SHARP enables the first quantitative trade-off analysis between role fidelity and factual consistency. It supports stable evaluation across 12 distinct worldviews, significantly improving hallucination detection accuracy and cross-role generalization. Through systematic analysis, SHARP identifies seven critical influencing factors. Our work establishes both a theoretical foundation and an evaluation infrastructure for developing trustworthy role-playing agents.
📝 Abstract
The advanced role-playing capabilities of Large Language Models (LLMs) have paved the way for developing Role-Playing Agents (RPAs). However, existing benchmarks in social interaction such as HPD and SocialBench have not investigated hallucination and face limitations like poor generalizability and implicit judgments for character fidelity. To address these issues, we propose a generalizable, explicit and effective paradigm to unlock the interactive patterns in diverse worldviews. Specifically, we define the interactive hallucination based on stance transfer and construct a benchmark, SHARP, by extracting relations from a general commonsense knowledge graph and leveraging the inherent hallucination properties of RPAs to simulate interactions across roles. Extensive experiments validate the effectiveness and stability of our paradigm. Our findings further explore the factors influencing these metrics and discuss the trade-off between blind loyalty to roles and adherence to facts in RPAs.