đ¤ AI Summary
Addressing core challenges in UAV visual navigationâincluding high sample complexity, poor sim-to-real transfer, and weak runtime scene generalizationâexacerbated by strong nonlinear dynamics and tight perceptionâcontrol coupling, this paper proposes the first end-to-end framework integrating 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning. We introduce a Context-aware Estimator Network (CENet) for online adaptation to dynamic environments and a multi-environment curriculum training strategy to enhance intra-task generalization. Experiments demonstrate: (i) significantly improved training sample efficiency; (ii) the first reported zero-shot sim-to-real transfer; and (iii) real-time, robust adaptation to novel task instances within the same classâe.g., traversing doorways under varying poses or disturbancesâwithout retraining. The framework bridges geometric priors and differentiable control, enabling scalable, physics-aware visual navigation with strong generalization across simulation and reality.
đ Abstract
Autonomous visual navigation is an essential element in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenarios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method improves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment).