Towards Physically Executable 3D Gaussian for Embodied Navigation

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

3D Gaussian Splatting (3DGS) lacks fine-grained semantic understanding and physical executability, hindering its application in Vision-and-Language Navigation (VLN). Method: We propose SAGE-3D—the first semantic-physically aligned, executable 3DGS environment for VLN—by integrating object-level semantic annotation, physically grounded collision modeling of 3D Gaussian point clouds, and object-centric semantic grounding to jointly optimize semantic comprehension and embodied interaction. Contributions/Results: (1) We introduce InteriorGS, a large-scale dataset comprising 1K indoor scenes rendered via 3DGS; (2) We release SAGE-Bench, the first 3DGS-based VLN benchmark; (3) On the VLN-CE Unseen task, SAGE-3D achieves a 31% relative improvement over state-of-the-art baselines, demonstrating strong zero-shot generalization and real-world transfer potential.

Technology Category

Application Category

📝 Abstract

3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose SAGE-3D (Semantically and Physically Aligned Gaussian Environments for 3D Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task. The data and code will be available soon.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D Gaussian Splatting with fine-grained semantic annotations

Adding physical executability to 3DGS for embodied navigation tasks

Addressing sim-to-real gap in Visual-Language Navigation using 3DGS

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-level semantic annotations added to 3DGS

Collision objects embedded into 3DGS for physics

Constructed physical interfaces for embodied navigation

🔎 Similar Papers

Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps