GaussNav: Gaussian Splatting for Visual Navigation

📅 2024-03-18
🏛️ arXiv.org
📈 Citations: 7
Influential: 1
📄 PDF

career value

214K/year
🤖 AI Summary
This paper addresses the Instance Image Navigation (IIN) task—where an embodied agent localizes a specific object in an unknown environment given only a target image—by proposing a novel 3D scene representation based on 3D Gaussian Splatting (3DGS). To overcome the limitations of conventional bird’s-eye-view (BEV) maps—namely, texture absence and poor cross-view instance matching capability—we introduce 3DGS to embodied navigation for the first time, constructing a compact, unified 3D representation that jointly encodes geometry, semantics, and photorealistic texture. Our method integrates neural rendering with cross-view feature matching to enable end-to-end visual navigation. Evaluated on the HM3D dataset, our approach achieves a Success-weighted by Path Length (SPL) of 0.578, outperforming prior baselines by 65.4%. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary challenge of IIN arises from the need to recognize the target object across varying viewpoints while ignoring potential distractors. Existing map-based navigation methods typically use Bird's Eye View (BEV) maps, which lack detailed texture representation of a scene. Consequently, while BEV maps are effective for semantic-level visual navigation, they are struggling for instance-level tasks. To this end, we propose a new framework for IIN, Gaussian Splatting for Visual Navigation (GaussNav), which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The GaussNav framework enables the agent to memorize both the geometry and semantic information of the scene, as well as retain the textural features of objects. By matching renderings of similar objects with the target, the agent can accurately identify, ground, and navigate to the specified object. Our GaussNav framework demonstrates a significant performance improvement, with Success weighted by Path Length (SPL) increasing from 0.347 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. The source code is publicly available at the link: https://github.com/XiaohanLei/GaussNav.
Problem

Research questions and friction points this paper is trying to address.

Improves instance-level visual navigation accuracy
Enhances object recognition across varying viewpoints
Introduces 3D Gaussian Splatting for detailed scene mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting mapping
Memorizes geometry and semantics
Enhances object texture recognition