π€ AI Summary
This work addresses the challenge of efficiently integrating exploration and task execution for robots operating in partially observable environments. The authors propose EPoG, a novel framework that uniquely combines large language models (LLMs) with a scene graphβbased belief update mechanism. By continuously maintaining a belief graph that represents both known and unknown objects, the system generates goal-directed action sequences through graph-editing operations and leverages the LLM for local replanning. This approach enables seamless joint planning of exploration and manipulation, supporting long-horizon, multi-objective autonomous decision-making. Evaluated across 46 household scenarios and five complex task categories, EPoG achieves a success rate of 91.3% and reduces average travel distance by 36.1%, with real-robot experiments further demonstrating its effectiveness in dynamic, unknown environments.
π Abstract
In partially known environments, robots must combine exploration to gather information with task planning for efficient execution. To address this challenge, we propose EPoG, an Exploration-based sequential manipulation Planning framework on Scene Graphs. EPoG integrates a graph-based global planner with a Large Language Model (LLM)-based situated local planner, continuously updating a belief graph using observations and LLM predictions to represent known and unknown objects. Action sequences are generated by computing graph edit operations between the goal and belief graphs, ordered by temporal dependencies and movement costs. This approach seamlessly combines exploration and sequential manipulation planning. In ablation studies across 46 realistic household scenes and 5 long-horizon daily object transportation tasks, EPoG achieved a success rate of 91.3%, reducing travel distance by 36.1% on average. Furthermore, a physical mobile manipulator successfully executed complex tasks in unknown and dynamic environments, demonstrating EPoG's potential for real-world applications.