Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This work addresses the challenges in embodied navigation posed by the scarcity of real-world aligned data and the limited generalization of high-fidelity simulation strategies. To overcome these limitations, the authors propose SAGE, a novel framework that replaces photorealistic rendering with physically constrained semantic abstraction to emulate human-like mental simulation. By pre-planning actions in simplified environments before execution, SAGE enhances planning efficacy. The approach integrates a three-stage pipeline—environment generation, experience distillation, and policy transfer—alongside a new asymmetric adaptive pruning mechanism, significantly improving policy stability and cross-domain generalization. Evaluated on the A-EQA task, the method achieves a 53.21% LLM-Match success rate, outperforming the baseline by 9.7%, and demonstrates strong transferability in real-world indoor robot deployments.
📝 Abstract
Vision-Language Models (VLMs) have demonstrated exceptional general reasoning capabilities. However, their performance in embodied navigation remains hindered by a scarcity of aligned open-world vision and robot control data. Despite simulators providing a cost-effective alternative for data collection, the inherent reliance on photorealistic simulations often limits the transferability of learned policies. To this end, we propose \textit{\textbf{S}andbox-\textbf{A}bstracted \textbf{G}rounded \textbf{E}xperience} (\textbf{\textit{SAGE}}), a framework that enables agents to learn within a physics-grounded semantic abstraction rather than a photorealistic simulation, mimicking the human capacity for mental simulation where plans are rehearsed in simplified physics abstractions before execution. \textit{SAGE} system operates via three synergistic phases: (1) \textit{Genesis}: constructing diverse, physics-constrained semantic environments to bootstrap experience; (2) \textit{Evolution}: distilling experiences through Reinforcement Learning (RL), utilizing a novel asymmetric adaptive clipping mechanism to stabilize updates; (3) \textit{Navigation}: bridging the abstract policy to open-world control. We demonstrate that \textit{SAGE} significantly improves planner-assisted embodied navigation, achieving a 53.21\% LLM-Match Success Rate on A-EQA (+9.7\% over baseline), while showing encouraging transfer to physical indoor robot deployment.
Problem

Research questions and friction points this paper is trying to address.

embodied navigation
vision-language models
open-world transfer
simulation-to-reality gap
physics-grounded abstraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-grounded abstraction
embodied navigation
semantic simulation
asymmetric adaptive clipping
vision-language models
🔎 Similar Papers
No similar papers found.