NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses robot navigation in dynamic open-world environments without requiring precise localization, mapping, or real-world demonstrations. Method: We propose NavDP, an end-to-end diffusion-based policy trained exclusively in simulation and transferred zero-shot to diverse physical robots (quadrupedal, wheeled, and humanoid). NavDP integrates diffusion-based trajectory generation with a contrastive critic for trajectory selection, leverages privileged simulation information to synthesize high-quality trajectories at scale, and introduces a real-to-sim high-fidelity fine-tuning paradigm enhanced by Gaussian Splatting for improved domain alignment. Contribution/Results: We construct a large-scale 363.2 km navigation dataset and unify diffusion models, Transformer-based policy networks, and privilege-guided data augmentation. Experiments demonstrate state-of-the-art performance on real robots: NavDP achieves a 30% zero-shot success rate improvement over prior methods and exhibits significantly enhanced generalization robustness across unseen environments and platforms.

Technology Category

Application Category

📝 Abstract
Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in diverse real-world environments. The key ingredient of NavDP's network is the combination of diffusion-based trajectory generation and a critic function for trajectory selection, which are conditioned on only local observation tokens encoded from a shared policy transformer. Given the privileged information of the global environment in simulation, we scale up the demonstrations of good quality to train the diffusion policy and formulate the critic value function targets with contrastive negative samples. Our demonstration generation approach achieves about 2,500 trajectories/GPU per day, 20$ imes$ more efficient than real-world data collection, and results in a large-scale navigation dataset with 363.2km trajectories across 1244 scenes. Trained with this simulation dataset, NavDP achieves state-of-the-art performance and consistently outstanding generalization capability on quadruped, wheeled, and humanoid robots in diverse indoor and outdoor environments. In addition, we present a preliminary attempt at using Gaussian Splatting to make in-domain real-to-sim fine-tuning to further bridge the sim-to-real gap. Experiments show that adding such real-to-sim data can improve the success rate by 30% without hurting its generalization capability.
Problem

Research questions and friction points this paper is trying to address.

Learning navigation in dynamic open-world environments for robots
Zero-shot transfer from simulation to diverse real-world environments
Efficient large-scale demonstration generation for training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based trajectory generation with critic selection
Sim-to-real transfer with privileged information guidance
Gaussian Splatting for real-to-sim fine-tuning
🔎 Similar Papers
No similar papers found.
Wenzhe Cai
Wenzhe Cai
Shanghai AI Laboratory
Reinforcement LearningVisual NavigationRobotics
J
Jiaqi Peng
Shanghai AI Lab, Tsinghua University
Y
Yuqiang Yang
Shanghai AI Lab
Y
Yujian Zhang
Zhejiang University
M
Meng Wei
Shanghai AI Lab, The University of Hong Kong
H
Hanqing Wang
Shanghai AI Lab
Y
Yilun Chen
Shanghai AI Lab
Tai Wang
Tai Wang
Shanghai AI Laboratory
Computer Vision3D VisionEmbodied AIDeep Learning
J
Jiangmiao Pang
Shanghai AI Lab