🤖 AI Summary
This work addresses the challenging problem of multi-object, multimodal, open-vocabulary visual navigation. To this end, we propose a language-guided 3D Gaussian lattice memory mechanism that unifies cross-modal semantic and spatial-geometric representations. Our method integrates editable scene reconstruction via 3D Gaussian Splatting with a language-vision alignment module to support arbitrary textual goal descriptions. Additionally, we introduce a local-aware verification and memory-query network to dynamically match and confirm the positions of multiple targets. We systematically evaluate our approach on the newly constructed GOAT-Core benchmark, demonstrating significant improvements over state-of-the-art methods. Specifically, our method achieves breakthrough gains in multi-object localization accuracy and navigation success rate under open-vocabulary conditions. The proposed framework establishes a scalable and interpretable cross-modal navigation paradigm for embodied intelligence.
📝 Abstract
Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo