GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics

📅 2025-03-06
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Addressing core challenges in UAV visual navigation—including high sample complexity, poor sim-to-real transfer, and weak runtime scene generalization—exacerbated by strong nonlinear dynamics and tight perception–control coupling, this paper proposes the first end-to-end framework integrating 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning. We introduce a Context-aware Estimator Network (CENet) for online adaptation to dynamic environments and a multi-environment curriculum training strategy to enhance intra-task generalization. Experiments demonstrate: (i) significantly improved training sample efficiency; (ii) the first reported zero-shot sim-to-real transfer; and (iii) real-time, robust adaptation to novel task instances within the same class—e.g., traversing doorways under varying poses or disturbances—without retraining. The framework bridges geometric priors and differentiable control, enabling scalable, physics-aware visual navigation with strong generalization across simulation and reality.

Technology Category

Application Category

📝 Abstract
Autonomous visual navigation is an essential element in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenarios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method improves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment).
Problem

Research questions and friction points this paper is trying to address.

High sample complexity in drone navigation RL methods.
Poor sim-to-real transfer for drone navigation policies.
Limited runtime adaptability to unseen navigation scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 3D Gaussian Splatting with DDRL
Uses Context-aided Estimator Network for adaptation
Curriculum training for in-task generalization
🔎 Similar Papers
No similar papers found.
Q
Qianzhong Chen
Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
N
Naixiang Gao
Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
J
JunEn Low
Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
J
Jiankai Sun
Aeronautics and Astronautics Department, Stanford University, Stanford, CA 94305, USA
Timothy Chen
Timothy Chen
Stanford University
RoboticsPerceptionControl
Mac Schwager
Mac Schwager
Stanford University
RoboticsControlMulti-Agent SystemsMachine LearningStatistical Inference and Estimation