Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of executing complex semantic tasks in large-scale outdoor multi-robot systems via natural language instructions. We propose an integrated 3D scene graph framework that unifies multi-robot SLAM, open-vocabulary object detection and mapping, LLM-driven semantic parsing, and hierarchical Task-and-Motion Planning (TAMP). We introduce a novel language-guided PDDL goal generation mechanism to close the loop from intent to executable actions, and design a view-invariant relocalization method with shared scene graph fusion for collaborative multi-robot operation. Evaluated in real-world large-scale outdoor environments, our system achieves significant improvements in natural language understanding accuracy and task success rate, reduces relocalization error by 42%, and maintains planning response latency under 800 ms. The core contribution is the first end-to-end 3D semantic planning system supporting open-set object recognition, language-guided goal generation, and multi-robot collaborative relocalization.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments.

Problem

Research questions and friction points this paper is trying to address.

Integrates mapping, localization, and TAMP for multi-robot systems

Executes complex natural language instructions via 3D scene graphs

Translates operator intent into PDDL goals using LLM and context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-robot 3D scene graph fusion

LLM translates intent into PDDL goals

Object-based map enables real-time relocalization

🔎 Similar Papers

No similar papers found.