PersuasiveToM: A Benchmark for Evaluating Machine Theory of Mind in Persuasive Dialogues

📅 2025-02-28
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Existing Theory of Mind (ToM) evaluation benchmarks predominantly focus on static physical perception, failing to capture the dynamically evolving mental states inherent in persuasive dialogue. To address this limitation, we introduce PersuasiveToM—the first ToM benchmark specifically designed for persuasive interactions—featuring dual-dimensional tasks: ToM reasoning (e.g., tracking shifts in interlocutors’ intentions) and ToM application (e.g., strategic utterance selection and outcome evaluation). Leveraging a high-quality, human-annotated dataset of multi-turn persuasive dialogues, PersuasiveToM pioneers the extension of ToM evaluation from static settings to dynamic social interactions, emphasizing both fine-grained mental-state modeling and strategic deployment. Experiments across eight state-of-the-art LLMs reveal that while models excel at static ToM reasoning, they exhibit substantial deficits in tracking dynamic mental-state evolution and achieving holistic mindreading—highlighting critical bottlenecks in current LLMs’ ToM capabilities.

Technology Category

Application Category

📝 Abstract
The ability to understand and predict the mental states of oneself and others, known as the Theory of Mind (ToM), is crucial for effective social interactions. Recent research has emerged to evaluate whether Large Language Models (LLMs) exhibit a form of ToM. Although recent studies have evaluated ToM in LLMs, existing benchmarks focus predominantly on physical perception with principles guided by the Sally-Anne test in synthetic stories and conversations, failing to capture the complex psychological activities of mental states in real-life social interactions. To mitigate this gap, we propose PersuasiveToM, a benchmark designed to evaluate the ToM abilities of LLMs in persuasive dialogues. Our framework introduces two categories of questions: (1) ToM Reasoning, assessing the capacity of LLMs to track evolving mental states (e.g., desire shifts in persuadees), and (2) ToM Application, evaluating whether LLMs can take advantage of inferred mental states to select effective persuasion strategies (e.g., emphasize rarity) and evaluate the effectiveness of persuasion strategies. Experiments across eight state-of-the-art LLMs reveal that while models excel on multiple questions, they struggle to answer questions that need tracking the dynamics and shifts of mental states and understanding the mental states in the whole dialogue comprehensively. Our aim with PersuasiveToM is to allow an effective evaluation of the ToM reasoning ability of LLMs with more focus on complex psychological activities. Our code is available at https://github.com/Yu-Fangxu/PersuasiveToM.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' Theory of Mind in persuasive dialogues.
Assesses tracking and application of evolving mental states.
Focuses on complex psychological activities in social interactions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces PersuasiveToM benchmark for LLM evaluation
Focuses on mental state tracking in persuasive dialogues
Evaluates LLM ability to apply inferred mental states
🔎 Similar Papers
No similar papers found.
Fangxu Yu
Fangxu Yu
University of Maryland
Natural Language ProcessingReasoning
L
Lai Jiang
Department of Computer Science and Engineering, Shanghai Jiao Tong University
Shenyi Huang
Shenyi Huang
University of California San Diego
Z
Zhen Wu
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Xinyu Dai
Xinyu Dai
Nanjing University