3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

📅 2024-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D scene representations—such as object-centric scene graphs—oversimplify spatial relationships by relying on discrete textual descriptions, hindering fine-grained spatial reasoning; moreover, they lack mechanisms for active exploration and incremental memory, limiting embodied agents’ lifelong autonomy. This paper proposes a 3D scene memory framework tailored for embodied agents. It constructs multi-view image-based memory snapshots to encode explored regions and introduces a novel “known–frontier” dual-snapshot mechanism to guide active exploration. An incremental 3D memory construction pipeline and content-based efficient retrieval method are designed, overcoming the spatial modeling limitations of conventional scene graphs. Evaluated on three benchmarks, the framework achieves significant improvements in exploration coverage (+12.7%) and spatial reasoning accuracy (+18.3%), demonstrating its effectiveness for long-horizon autonomous operation.

Technology Category

Application Category

📝 Abstract
Constructing compact and informative 3D scene representations is essential for effective embodied exploration and reasoning, especially in complex environments over extended periods. Existing representations, such as object-centric 3D scene graphs, oversimplify spatial relationships by modeling scenes as isolated objects with restrictive textual relationships, making it difficult to address queries requiring nuanced spatial understanding. Moreover, these representations lack natural mechanisms for active exploration and memory management, hindering their application to lifelong autonomy. In this work, we propose 3D-Mem, a novel 3D scene memory framework for embodied agents. 3D-Mem employs informative multi-view images, termed Memory Snapshots, to represent the scene and capture rich visual information of explored regions. It further integrates frontier-based exploration by introducing Frontier Snapshots-glimpses of unexplored areas-enabling agents to make informed decisions by considering both known and potential new information. To support lifelong memory in active exploration settings, we present an incremental construction pipeline for 3D-Mem, as well as a memory retrieval technique for memory management. Experimental results on three benchmarks demonstrate that 3D-Mem significantly enhances agents' exploration and reasoning capabilities in 3D environments, highlighting its potential for advancing applications in embodied AI.
Problem

Research questions and friction points this paper is trying to address.

Constructing compact 3D scene representations for exploration.
Addressing limitations in existing object-centric 3D scene graphs.
Enabling lifelong memory management in active exploration settings.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Snapshots capture rich visual information
Frontier Snapshots enable informed exploration decisions
Incremental construction supports lifelong memory management
🔎 Similar Papers
No similar papers found.