PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automatically inferring symbolic action semantics (preconditions and effects) in interactive visual environments. We propose the first autonomous neuro-symbolic system that requires no expert-defined actions, no predefined problem files, full observability, or explicit error feedback. Methodologically, the system dynamically constructs and iteratively refines PDDL domain models and problem instances from visual interaction traces, integrating LLM-guided planning, tree-structured belief maintenance, execution feedback analysis, and error attribution reasoning to enable closed-loop symbolic learning and planning under partial observability. Our key contribution is the first end-to-end PDDL modeling and semantic belief updating without prior action semantics. Evaluation shows plan success on ALFRED improves from 37% to 74%; we further validate multi-agent coordination and cross-task domain generalization on RTFM and Overcooked-AI.

Technology Category

Application Category

📝 Abstract
We propose PSALM-V, the first autonomous neuro-symbolic learning system able to induce symbolic action semantics (i.e., pre- and post-conditions) in visual environments through interaction. PSALM-V bootstraps reliable symbolic planning without expert action definitions, using LLMs to generate heuristic plans and candidate symbolic semantics. Previous work has explored using large language models to generate action semantics for Planning Domain Definition Language (PDDL)-based symbolic planners. However, these approaches have primarily focused on text-based domains or relied on unrealistic assumptions, such as access to a predefined problem file, full observability, or explicit error messages. By contrast, PSALM-V dynamically infers PDDL problem files and domain action semantics by analyzing execution outcomes and synthesizing possible error explanations. The system iteratively generates and executes plans while maintaining a tree-structured belief over possible action semantics for each action, iteratively refining these beliefs until a goal state is reached. Simulated experiments of task completion in ALFRED demonstrate that PSALM-V increases the plan success rate from 37% (Claude-3.7) to 74% in partially observed setups. Results on two 2D game environments, RTFM and Overcooked-AI, show that PSALM-V improves step efficiency and succeeds in domain induction in multi-agent settings. PSALM-V correctly induces PDDL pre- and post-conditions for real-world robot BlocksWorld tasks, despite low-level manipulation failures from the robot.
Problem

Research questions and friction points this paper is trying to address.

Automates symbolic planning in visual environments using LLMs
Infers PDDL problem files without predefined expert definitions
Improves plan success rates in partially observed setups
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate heuristic plans and semantics
Dynamic PDDL problem file inference
Iterative plan execution and belief refinement
🔎 Similar Papers
No similar papers found.