Uncertainty-Aware Gaussian Map for Vision-Language Navigation

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unreliability of decision-making in vision-and-language navigation caused by geometric, semantic, and appearance ambiguities. To this end, the authors propose Semantic Gaussian Maps (SGM), which for the first time explicitly model and integrate these three sources of perceptual uncertainty within a unified, differentiable 3D Gaussian representation. The approach leverages variational perturbations to estimate geometric and semantic uncertainties and employs Fisher information to quantify appearance uncertainty, thereby constructing an end-to-end trainable 3D value map. Experimental results demonstrate that SGM significantly improves both navigation success rates and path quality across multiple vision-and-language navigation benchmarks, establishing a new paradigm for uncertainty-aware navigational agents.
📝 Abstract
Vision-Language Navigation (VLN) requires an agent to navigate 3D environments following natural language instructions. During navigation, existing agents commonly encounter perceptual uncertainty, such as insufficient evidence for reliable grounding or ambiguity in interpreting spatial cues, yet they typically ignore such information when predicting actions. In this work, we explicitly model three forms of perceptual uncertainty (i.e., geometric, semantic, and appearance uncertainty) and integrate them into the agent's observation space to enable informed decision-making. Concretely, our agent first constructs a Semantic Gaussian Map (SGM), composed of differentiable 3D Gaussian primitives initialized from panoramic observations, that encodes both the geometric structure and semantic content of the environment. On top of SGM, geometric uncertainty is estimated through variational perturbations of Gaussian position and scale to assess structural reliability; semantic uncertainty is captured by perturbing Gaussian semantic attributes to reveal ambiguous interpretations; and appearance uncertainty is characterized by Fisher Information, which measures the sensitivity of rendered observations to Gaussian-level variations. These uncertainties are incorporated into SGM, extending it into a unified 3D Value Map, which grounds them as affordances and constraints that support reliable navigation. Comprehensive evaluations across multiple VLN benchmarks show the effectiveness of our agent.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Navigation
Perceptual Uncertainty
Geometric Uncertainty
Semantic Uncertainty
Appearance Uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-Aware Gaussian Map
Semantic Gaussian Map
Perceptual Uncertainty
Vision-Language Navigation
3D Value Map
🔎 Similar Papers
No similar papers found.