Relationship-Aware Hierarchical 3D Scene Graph for Task Reasoning

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limitation of existing SLAM and semantic mapping approaches in capturing high-level abstractions and reasoning about object relationships, which hinders task-level understanding and decision-making for agents in 3D environments. To overcome this, the authors propose a hierarchical 3D scene graph representation that integrates open-vocabulary semantics, leveraging visual language models (VLMs) and large language models (LLMs) in a synergistic manner for scene graph construction and task-oriented reasoning. This is the first approach to enable open-set comprehension of object semantics and their relational context through joint VLM–LLM collaboration. The method is validated on a quadrupedal robot platform across diverse environments and tasks, demonstrating significant improvements in the agent’s high-level semantic understanding and interactive capabilities within complex 3D scenes.

Technology Category

Application Category

📝 Abstract

Representing and understanding 3D environments in a structured manner is crucial for autonomous agents to navigate and reason about their surroundings. While traditional Simultaneous Localization and Mapping (SLAM) methods generate metric reconstructions and can be extended to metric-semantic mapping, they lack a higher level of abstraction and relational reasoning. To address this gap, 3D scene graphs have emerged as a powerful representation for capturing hierarchical structures and object relationships. In this work, we propose an enhanced hierarchical 3D scene graph that integrates open-vocabulary features across multiple abstraction levels and supports object-relational reasoning. Our approach leverages a Vision Language Model (VLM) to infer semantic relationships. Notably, we introduce a task reasoning module that combines Large Language Models (LLM) and a VLM to interpret the scene graph's semantic and relational information, enabling agents to reason about tasks and interact with their environment more intelligently. We validate our method by deploying it on a quadruped robot in multiple environments and tasks, highlighting its ability to reason about them.

Problem

Research questions and friction points this paper is trying to address.

3D scene graph

relational reasoning

task reasoning

hierarchical representation

autonomous agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D scene graph

open-vocabulary representation

Vision Language Model (VLM)