Active Confusion Expression in Large Language Models: Leveraging World Models toward Better Social Reasoning

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models (LLMs) frequently conflate objective facts with agents’ subjective beliefs during social reasoning, leading to cognitive confusion, logical inconsistencies, and infinite reasoning loops—particularly in multi-agent and multi-timeline scenarios. To address this, we propose a dynamic world model–enhanced reasoning framework. Our approach introduces, for the first time, an adaptive textual world model that explicitly tracks both the objective state trajectory and the belief trajectories of all agents. We further design a perplexity-signal detection module that actively identifies cognitive confusion and injects precise, context-aware state descriptions—mimicking human implicit mental modeling. Evaluated on DeepSeek-R1, our method achieves an average 10% accuracy gain across three major social reasoning benchmarks (including Hi-ToM), while reducing token consumption by up to 33.8%. This represents a substantive advance in both reasoning fidelity and computational efficiency for socially grounded LLM inference.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) excel in mathematical and code reasoning, we observe they struggle with social reasoning tasks, exhibiting cognitive confusion, logical inconsistencies, and conflation between objective world states and subjective belief states. Through deteiled analysis of DeepSeek-R1's reasoning trajectories, we find that LLMs frequently encounter reasoning impasses and tend to output contradictory terms like "tricky" and "confused" when processing scenarios with multiple participants and timelines, leading to erroneous reasoning or infinite loops. The core issue is their inability to disentangle objective reality from agents' subjective beliefs. To address this, we propose an adaptive world model-enhanced reasoning mechanism that constructs a dynamic textual world model to track entity states and temporal sequences. It dynamically monitors reasoning trajectories for confusion indicators and promptly intervenes by providing clear world state descriptions, helping models navigate through cognitive dilemmas. The mechanism mimics how humans use implicit world models to distinguish between external events and internal beliefs. Evaluations on three social benchmarks demonstrate significant improvements in accuracy (e.g., +10% in Hi-ToM) while reducing computational costs (up to 33.8% token reduction), offering a simple yet effective solution for deploying LLMs in social contexts.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with social reasoning tasks and exhibit cognitive confusion

They cannot separate objective reality from agents' subjective beliefs

This causes reasoning impasses and contradictions in multi-participant scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic textual world model tracks entity states

Monitors reasoning trajectories for confusion indicators

Provides clear world state descriptions during dilemmas

🔎 Similar Papers

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models