GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian Fields

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the low sample efficiency and sim-to-real transfer challenges in end-to-end learning for monocular vision-based autonomous flight. The authors propose a β€œreal β†’ simulation β†’ real” paradigm that decouples representation learning from policy optimization. They introduce geometry-constrained 3D Gaussian Splatting to reconstruct high-fidelity simulation environments and leverage contrastive learning to extract robust, low-dimensional visual features for efficient visuomotor policy training. This study presents the first integration of 3D Gaussian Splatting with contrastive reinforcement learning, enabling zero-shot cross-domain transfer without fine-tuning. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches in both simulated and real-world settings, effectively narrowing the domain performance gap and generalizing successfully to unseen, complex-textured environments.
πŸ“ Abstract
Learning visuomotor policies for Autonomous Aerial Vehicles (AAVs) relying solely on monocular vision is an attractive yet highly challenging paradigm. Existing end-to-end learning approaches directly map high-dimensional RGB observations to action commands, which frequently suffer from low sample efficiency and severe sim-to-real gaps due to the visual discrepancy between simulation and physical domains. To address these long-standing challenges, we propose GaussFly, a novel framework that explicitly decouples representation learning from policy optimization through a cohesive real-to-sim-to-real paradigm. First, to achieve a high-fidelity real-to-sim transition, we reconstruct training scenes using 3D Gaussian Splatting (3DGS) augmented with explicit geometric constraints. Second, to ensure robust sim-to-real transfer, we leverage these photorealistic simulated environments and employ contrastive representation learning to extract compact, noise-resilient latent features from the rendered RGB images. By utilizing this pre-trained encoder to provide low-dimensional feature inputs, the computational burden on the visuomotor policy is significantly reduced while its resistance against visual noise is inherently enhanced. Extensive experiments in simulated and real-world environments demonstrate that GaussFly achieves superior sample efficiency and asymptotic performance compared to baselines. Crucially, it enables robust and zero-shot policy transfer to unseen real-world environments with complex textures, effectively bridging the sim-to-real gap.
Problem

Research questions and friction points this paper is trying to address.

visuomotor policies
sim-to-real gap
monocular vision
Autonomous Aerial Vehicles
sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
contrastive representation learning
visuomotor policy
sim-to-real transfer
monocular vision
πŸ”Ž Similar Papers
2023-06-06International Conference on Learning RepresentationsCitations: 4
Y
Yuhang Zhang
School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore
Mingsheng Li
Mingsheng Li
Bowling Green State University
Corporate financedividend policymutual fundETFs
Y
Yujing Shang
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
Z
Zhuoyuan Yu
College of Design and Engineering, National University of Singapore, Singapore 119077, Singapore
Chao Yan
Chao Yan
Instructor at DBMI, VUMC; CS PhD from Vanderbilt U
AI for medicineSynthetic health dataPrivacyFairness
Jiaping Xiao
Jiaping Xiao
Nanyang Technological University
Cyber-Physical SystemsIntelligent SystemsMultirobot LearningArtificial Intelligence
Mir Feroskhan
Mir Feroskhan
School of Mechanical and Aerospace Engineering, Nanyang Technological University
Flight Dynamics and ControleVTOLUnmanned Aerial Systems