Plan Verification for LLM-Based Embodied Task Completion Agents

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In embodied AI, LLM-generated plans frequently contain redundant actions, navigation errors, and logical inconsistencies, degrading policy quality. To address this, we propose an iterative plan validation framework: a Judge LLM identifies and critiques multiple error types—e.g., invalid, duplicate, or out-of-order actions—in natural language; a Planner LLM then refines the plan based on this critique, enabling generalizable error detection and correction while preserving human-like debugging paradigms. Our method leverages the TEACh dataset and natural language inference techniques. Evaluated across four state-of-the-art LLMs, it achieves 90% recall and 100% precision in error detection, with 96.5% of execution trajectories converging within three iterations. The framework significantly improves action consistency, spatiotemporal efficiency, and the fidelity of imitation learning datasets.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) based task plans and corresponding human demonstrations for embodied AI may be noisy, with unnecessary actions, redundant navigation, and logical errors that reduce policy quality. We propose an iterative verification framework in which a Judge LLM critiques action sequences and a Planner LLM applies the revisions, yielding progressively cleaner and more spatially coherent trajectories. Unlike rule-based approaches, our method relies on natural language prompting, enabling broad generalization across error types including irrelevant actions, contradictions, and missing steps. On a set of manually annotated actions from the TEACh embodied AI dataset, our framework achieves up to 90% recall and 100% precision across four state-of-the-art LLMs (GPT o4-mini, DeepSeek-R1, Gemini 2.5, LLaMA 4 Scout). The refinement loop converges quickly, with 96.5% of sequences requiring at most three iterations, while improving both temporal efficiency and spatial action organization. Crucially, the method preserves human error-recovery patterns rather than collapsing them, supporting future work on robust corrective behavior. By establishing plan verification as a reliable LLM capability for spatial planning and action refinement, we provide a scalable path to higher-quality training data for imitation learning in embodied AI.

Problem

Research questions and friction points this paper is trying to address.

Verifying noisy LLM-based plans for embodied AI tasks

Correcting unnecessary actions and logical errors in trajectories

Improving spatial coherence and temporal efficiency of actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative LLM verification framework for plan refinement

Natural language prompting for error generalization

Preserves human error-recovery patterns in trajectories

🔎 Similar Papers

No similar papers found.