Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of executing complex semantic tasks in large-scale outdoor multi-robot systems via natural language instructions. We propose an integrated 3D scene graph framework that unifies multi-robot SLAM, open-vocabulary object detection and mapping, LLM-driven semantic parsing, and hierarchical Task-and-Motion Planning (TAMP). We introduce a novel language-guided PDDL goal generation mechanism to close the loop from intent to executable actions, and design a view-invariant relocalization method with shared scene graph fusion for collaborative multi-robot operation. Evaluated in real-world large-scale outdoor environments, our system achieves significant improvements in natural language understanding accuracy and task success rate, reduces relocalization error by 42%, and maintains planning response latency under 800 ms. The core contribution is the first end-to-end 3D semantic planning system supporting open-set object recognition, language-guided goal generation, and multi-robot collaborative relocalization.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments.
Problem

Research questions and friction points this paper is trying to address.

Integrates mapping, localization, and TAMP for multi-robot systems
Executes complex natural language instructions via 3D scene graphs
Translates operator intent into PDDL goals using LLM and context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-robot 3D scene graph fusion
LLM translates intent into PDDL goals
Object-based map enables real-time relocalization
🔎 Similar Papers
No similar papers found.
Jared Strader
Jared Strader
Assistant Professor, Oakland University; formerly: MIT; JPL; WVU
RoboticsAutonomyPerception
Aaron Ray
Aaron Ray
MIT
Robotics
Jacob Arkin
Jacob Arkin
Postdoctoral Associate, Massachusetts Institute of Technology
RoboticsNatural Language UnderstandingHuman-Robot Communication
Mason B. Peterson
Mason B. Peterson
Graduate Student, Department of Aeronautics and Astronautics, MIT
perceptioncomputer visionroboticsSLAM
Yun Chang
Yun Chang
Massachusetts Institute of Technology
PerceptionRoboticsLocalizationMappingAutonomy
N
Nathan Hughes
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Christopher Bradley
Christopher Bradley
CSAIL, MIT
RoboticsArtificial Intelligence
Y
Yi Xuan Jia
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Carlos Nieto-Granda
Carlos Nieto-Granda
U.S. Army Research Laboratory (ARL)
Multi-robot and multi-agent systemsAutonomous Navigation & ExplorationSLAMHuman-Robot Teams
Rajat Talak
Rajat Talak
Research Scientist, SPARKlab, Massachusetts Institute of Technology
Robot PerceptionOptimization and LearningAutonomous SystemsCommunication Networks
Chuchu Fan
Chuchu Fan
Associate Professor of Aeronautics and Astronautics at MIT
Cyber-Physical SystemsAutonomous SystemsFormal MethodsControl
Luca Carlone
Luca Carlone
Associate Professor, Massachusetts Institute of Technology
RoboticsRobot PerceptionComputer VisionEstimation and InferenceOptimization and Learning
Jonathan P. How
Jonathan P. How
Ford Professor of Engineering, AA Dept., Massachusetts Institute of Technology
Control systemsMulti-agent systemsAerial RoboticsSensor FusionAutonomous Driving
Nicholas Roy
Nicholas Roy
MIT
RoboticsMachine LearningHuman-Robot InteractionMicro Air Vehicles