Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study addresses key limitations of large language model (LLM) agents in active perception, collaborative reasoning, and perspective-taking—specifically, occluded spatial inference, knowledge-state modeling, and cognitive-cost trade-offs. We propose a planning-guided, structured reasoning enhancement method that innovatively integrates the Fast Downward planner with the ReAct framework to generate three types of thought-action exemplars (G/E/L). Through prompt engineering, we explicitly elicit LLMs’ reasoning justifications. Experiments show that L-type exemplars slightly reduce clarification requests and action steps; agents achieve basic attentional filtering but struggle with complex mentalizing tasks—including multi-perspective occlusion reasoning and knowledge-cost balancing. Our core contribution is the first LLM agent enhancement framework that systematically maps symbolic planning solution graphs onto interpretable thought-action chains and rigorously evaluates improvements in perspective-taking capability.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective -taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.

Problem

Research questions and friction points this paper is trying to address.

Improving perspective-taking capabilities in LLM-based agents

Addressing challenges in active perception and collaborative reasoning

Enhancing epistemic reasoning through structured thought-action sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured solution-processing pipeline generation

Thought-action examples from transformed planner graphs

Three distinct example categories for reasoning

🔎 Similar Papers

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models