🤖 AI Summary
This work addresses the limitations of existing embodied agents, which typically rely on post-hoc, geometry-centric offline methods for spatial memory construction, often failing to integrate high-level semantics and overlooking critical navigational landmarks. To overcome this, the authors propose ABot-Explorer, a novel framework that achieves the first online coupling of exploration and structured semantic graph memory (SG-Memo). Driven solely by RGB input, the agent leverages a large vision-language model to extract semantic navigation affordances (SNAs) as cognitive anchors, guiding it to prioritize structurally salient transition nodes during exploration. The approach introduces a cognitively aligned SNA mechanism and contributes the first large-scale dataset annotated with both SNAs and SG-Memo. Experiments demonstrate significant improvements over state-of-the-art methods in exploration efficiency and environmental coverage, with the constructed SG-Memo effectively supporting diverse downstream tasks.
📝 Abstract
Constructing structured spatial memory is essential for enabling long-horizon reasoning in complex embodied navigation tasks. Current memory construction predominantly relies on a decoupled, two-stage paradigm: agents first aggregate environmental data through exploration, followed by the offline reconstruction of spatial memory. However, this post-hoc and geometry-centric approach precludes agents from leveraging high-level semantic intelligence, often causing them to overlook navigationally critical landmarks (e.g., doorways and staircases) that serve as fundamental semantic anchors in human cognitive maps. To bridge this gap, we propose ABot-Explorer, a novel active exploration framework that unifies memory construction and exploration into an online, RGB-only process. At its core, ABot-Explorer leverages Large Vision-Language Models (VLMs) to distill Semantic Navigational Affordances (SNA), which act as cognitive-aligned anchors to guide the agent's movement. By dynamically integrating these SNAs into a hierarchical SG-Memo, ABot-Explorer mirrors human-like exploratory logic by prioritizing structural transit nodes to facilitate efficient coverage. To support this framework, we contribute a large-scale dataset extending InteriorGS with SNA and SG-Memo annotations. Experimental results demonstrate that ABot-Explorer significantly outperforms current state-of-the-art methods in both exploration efficiency and environment coverage, while the resulting SG-Memo is shown to effectively support diverse downstream tasks.