Embodied Task Planning via Graph-Informed Action Generation with Large Lanaguage Model

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenges of incoherent planning and hallucination in large language models (LLMs) when deployed for long-horizon task planning in embodied agents, which stem from limited context windows and violations of environmental constraints. To mitigate these issues, the authors propose a Graph-in-Graph memory architecture that encodes environmental states as graph embeddings and organizes them into an execution trajectory graph. This framework integrates structure-aware prior retrieval with a symbolic bounded lookahead module to generate coherent, constraint-compliant plans. Notably, it is the first approach to combine structured graph embedding clustering with symbolic state-transition logic, substantially enhancing LLMs’ long-term planning capabilities in dynamic environments. Evaluated on the Robotouille (synchronous and asynchronous) and ALFWorld benchmarks, the method achieves Pass@1 improvements of 22%, 37%, and 15%, respectively, with comparable or lower computational overhead.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) have demonstrated strong zero-shot reasoning capabilities, their deployment as embodied agents still faces fundamental challenges in long-horizon planning. Unlike open-ended text generation, embodied agents must decompose high-level intent into actionable sub-goals while strictly adhering to the logic of a dynamic, observed environment. Standard LLM planners frequently fail to maintain strategy coherence over extended horizons due to context window limitation or hallucinate transitions that violate constraints. We propose GiG, a novel planning framework that structures embodied agents'memory using a Graph-in-Graph architecture. Our approach employs a Graph Neural Network (GNN) to encode environmental states into embeddings, organizing these embeddings into action-connected execution trace graphs within an experience memory bank. By clustering these graph embeddings, the framework enables retrieval of structure-aware priors, allowing agents to ground current decisions in relevant past structural patterns. Furthermore, we introduce a novel bounded lookahead module that leverages symbolic transition logic to enhance the agents'planning capabilities through the grounded action projection. We evaluate our framework on three embodied planning benchmarks-Robotouille Synchronous, Robotouille Asynchronous, and ALFWorld. Our method outperforms state-of-the-art baselines, achieving Pass@1 performance gains of up to 22% on Robotouille Synchronous, 37% on Asynchronous, and 15% on ALFWorld with comparable or lower computational cost.

Problem

Research questions and friction points this paper is trying to address.

Embodied Task Planning

Long-horizon Planning

Large Language Models

Action Generation

Environment Constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-in-Graph architecture

Embodied task planning

Graph Neural Network (GNN)