🤖 AI Summary
This work addresses autonomous navigation of monocular RGB-driven drones in cluttered environments. To overcome policy transfer failure caused by the sim-to-real perception gap, we propose an end-to-end reinforcement learning framework integrating 3D Gaussian Splatting (3DGS) simulation with adversarial domain adaptation (ADA). We are the first to employ 3DGS as a high-fidelity, differentiable visual simulator and jointly optimize it with ADA to achieve robust feature alignment under domain shifts—including illumination variations and texture scarcity. Our method requires no real-world data for fine-tuning and enables zero-shot deployment. It achieves safe, agile flight in previously unseen real-world cluttered scenes, significantly outperforming existing baselines in navigation success rate across diverse lighting conditions and textureless environments.
📝 Abstract
Modern autonomous navigation systems predominantly rely on lidar and depth cameras. However, a fundamental question remains: Can flying robots navigate in clutter using solely monocular RGB images? Given the prohibitive costs of real-world data collection, learning policies in simulation offers a promising path. Yet, deploying such policies directly in the physical world is hindered by the significant sim-to-real perception gap. Thus, we propose a framework that couples the photorealism of 3D Gaussian Splatting (3DGS) environments with Adversarial Domain Adaptation. By training in high-fidelity simulation while explicitly minimizing feature discrepancy, our method ensures the policy relies on domain-invariant cues. Experimental results demonstrate that our policy achieves robust zero-shot transfer to the physical world, enabling safe and agile flight in unstructured environments with varying illumination.