Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of research on inner-thought reasoning for role-playing language agents (RPLAs). We formally define this task and introduce ROLETHINK—the first dedicated benchmark—comprising two evaluation sets: original literary monologues (gold-standard) and expert-synthesized analyses (silver-standard). We propose MIRROR, a three-stage chain-of-reasoning framework that generates plausible, character-consistent inner thoughts via memory retrieval, reaction prediction, and motivation synthesis. MIRROR integrates large-language-model-based chain-of-thought reasoning, semantic memory retrieval, role-specific behavioral modeling, and expert-knowledge-guided synthetic evaluation. Experiments demonstrate that MIRROR significantly outperforms baselines on ROLETHINK, substantially improving RPLA consistency, depth, and anthropomorphism. All data, code, and models are publicly released.

Technology Category

Application Category

📝 Abstract
Recent advances in LLM-based role-playing language agents (RPLAs) have attracted broad attention in various applications. While chain-of-thought reasoning has shown importance in many tasks for LLMs, the internal thinking processes of RPLAs remain unexplored. Understanding characters' inner thoughts is crucial for developing advanced RPLAs. In this paper, we introduce ROLETHINK, a novel benchmark constructed from literature for evaluating character thought generation. We propose the task of inner thought reasoning, which includes two sets: the gold set that compares generated thoughts with original character monologues, and the silver set that uses expert synthesized character analyses as references. To address this challenge, we propose MIRROR, a chain-of-thought approach that generates character thoughts by retrieving memories, predicting character reactions, and synthesizing motivations. Through extensive experiments, we demonstrate the importance of inner thought reasoning for RPLAs, and MIRROR consistently outperforms existing methods. Resources are available at https://github.com/airaer1998/RPA_Thought.
Problem

Research questions and friction points this paper is trying to address.

Explores inner thought reasoning in role-playing language agents.
Introduces ROLETHINK benchmark for evaluating character thought generation.
Proposes MIRROR method for generating character thoughts effectively.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ROLETHINK benchmark for character thought evaluation
Proposes MIRROR for memory retrieval and thought synthesis
Demonstrates MIRROR's superiority in inner thought reasoning
🔎 Similar Papers
No similar papers found.
R
Rui Xu
Fudan University, INF Technology (Shanghai) Co., Ltd.
M
MingYu Wang
Fudan University
D
Dakuan Lu
INF Technology (Shanghai) Co., Ltd.
X
Xiaoyu Tan
INF Technology (Shanghai) Co., Ltd.
W
Wei Chu
INF Technology (Shanghai) Co., Ltd.
Yinghui Xu
Yinghui Xu
Research Scientist/Senior Director
machine learningmachine visionoptimization