LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenging problem of multi-object, multimodal, open-vocabulary visual navigation. To this end, we propose a language-guided 3D Gaussian lattice memory mechanism that unifies cross-modal semantic and spatial-geometric representations. Our method integrates editable scene reconstruction via 3D Gaussian Splatting with a language-vision alignment module to support arbitrary textual goal descriptions. Additionally, we introduce a local-aware verification and memory-query network to dynamically match and confirm the positions of multiple targets. We systematically evaluate our approach on the newly constructed GOAT-Core benchmark, demonstrating significant improvements over state-of-the-art methods. Specifically, our method achieves breakthrough gains in multi-object localization accuracy and navigation success rate under open-vocabulary conditions. The proposed framework establishes a scalable and interpretable cross-modal navigation paradigm for embodied intelligence.

Technology Category

Application Category

📝 Abstract

Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo

Problem

Research questions and friction points this paper is trying to address.

Addressing multi-modal open-vocabulary multi-goal visual navigation challenges

Constructing unified 3D language memory for dynamic goal localization

Overcoming limitations of single-goal closed-set navigation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language 3D Gaussian Splatting memory for navigation

Unified 3D language memory construction during exploration

Local perception-based dynamic goal verification mechanism

🔎 Similar Papers

Navigation with VLM framework: Go to Any Language