Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Service robots face dual challenges in autonomous operation within dynamic, human-centric environments: conventional topological maps lack transient object modeling capability, while dense neural representations (e.g., NeRF) incur prohibitive computational overhead. To address this, we propose the Hierarchical Dynamic Scene Graph (HDSG), introducing a novel layered coupling architecture that anchors static topological graphs and dynamic interaction subgraphs via semantic-spatial constraints. We further pioneer LLM-driven zero-shot scene graph reasoning and executable instruction generation—requiring neither fine-tuning nor reinforcement learning. Our method integrates RGB-D topological mapping, lightweight NeRF-inspired representation, multi-view fusion, and LLM-based embodied reasoning. Evaluated in a real-world dynamic cafeteria setting, a mobile manipulator achieves zero-shot execution of multi-step tasks—including food retrieval, pedestrian avoidance, and delivery—with a 37% improvement in scene understanding accuracy over baseline methods.

Technology Category

Application Category

📝 Abstract

Autonomous operation of service robotics in human-centric scenes remains challenging due to the need for understanding of changing environments and context-aware decision-making. While existing approaches like topological maps offer efficient spatial priors, they fail to model transient object relationships, whereas dense neural representations (e.g., NeRF) incur prohibitive computational costs. Inspired by the hierarchical scene representation and video scene graph generation works, we propose Hi-Dyna Graph, a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy. Our framework constructs a global topological graph from posed RGB-D inputs, encoding room-scale connectivity and large static objects (e.g., furniture), while environmental and egocentric cameras populate dynamic subgraphs with object position relations and human-object interaction patterns. A hybrid architecture is conducted by anchoring these subgraphs to the global topology using semantic and spatial constraints, enabling seamless updates as the environment evolves. An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances. We conduct complex experiments to demonstrate Hi-Dyna Grap's superior scene representation effectiveness. Real-world deployments validate the system's practicality with a mobile manipulator: robotics autonomously complete complex tasks with no further training or complex rewarding in a dynamic scene as cafeteria assistant. See https://anonymous.4open.science/r/Hi-Dyna-Graph-B326 for video demonstration and more details.

Problem

Research questions and friction points this paper is trying to address.

Modeling dynamic object relationships in human-centric robotic environments

Balancing computational efficiency with detailed scene representation

Enabling context-aware decision-making for autonomous service robots

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical dynamic scene graph architecture

Hybrid global and dynamic subgraphs integration

LLM-powered agent for task interpretation

🔎 Similar Papers

No similar papers found.