🤖 AI Summary
This work addresses the challenge of executing complex semantic tasks in large-scale outdoor multi-robot systems via natural language instructions. We propose an integrated 3D scene graph framework that unifies multi-robot SLAM, open-vocabulary object detection and mapping, LLM-driven semantic parsing, and hierarchical Task-and-Motion Planning (TAMP). We introduce a novel language-guided PDDL goal generation mechanism to close the loop from intent to executable actions, and design a view-invariant relocalization method with shared scene graph fusion for collaborative multi-robot operation. Evaluated in real-world large-scale outdoor environments, our system achieves significant improvements in natural language understanding accuracy and task success rate, reduces relocalization error by 42%, and maintains planning response latency under 800 ms. The core contribution is the first end-to-end 3D semantic planning system supporting open-set object recognition, language-guided goal generation, and multi-robot collaborative relocalization.
📝 Abstract
In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments.