🤖 AI Summary
To address the robust task planning challenge in embodied instruction following under few-shot, partially observable, and dynamic environments, this paper proposes the first few-shot closed-loop task planning framework. Methodologically, we formulate the problem as a partially observable Markov decision process (POMDP), integrate large language models (LLMs) for instruction grounding and action planning, and introduce a novel hindsight reasoning mechanism coupled with a dynamic adaptation module to enable online policy refinement based on real-time state feedback. Our key contributions are threefold: (1) we are the first to surpass fully supervised baselines under strict few-shot settings (≤5 demonstrations); (2) we significantly improve out-of-distribution generalization and state recovery capability; and (3) on the ALFRED benchmark, our method achieves a 23.6% absolute gain in task success rate over prior few-shot approaches—matching or exceeding the performance of fully supervised planners.
📝 Abstract
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs). Previous works typically train a planner to imitate expert trajectories, treating this as a supervised task. While these methods achieve competitive performance, they often lack sufficient robustness. When a suboptimal action is taken, the planner may encounter an out-of-distribution state, which can lead to task failure. In contrast, we frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption. Thus, we propose a closed-loop planner with an adaptation module and a novel hindsight method, aiming to use as much information as possible to assist the planner. Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption. For the first time, our few-shot agent's performance approaches and even surpasses that of the full-shot supervised agent.