GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian Fields

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the low sample efficiency and sim-to-real transfer challenges in end-to-end learning for monocular vision-based autonomous flight. The authors propose a “real → simulation → real” paradigm that decouples representation learning from policy optimization. They introduce geometry-constrained 3D Gaussian Splatting to reconstruct high-fidelity simulation environments and leverage contrastive learning to extract robust, low-dimensional visual features for efficient visuomotor policy training. This study presents the first integration of 3D Gaussian Splatting with contrastive reinforcement learning, enabling zero-shot cross-domain transfer without fine-tuning. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches in both simulated and real-world settings, effectively narrowing the domain performance gap and generalizing successfully to unseen, complex-textured environments.

Technology Category

Application Category

📝 Abstract

Learning visuomotor policies for Autonomous Aerial Vehicles (AAVs) relying solely on monocular vision is an attractive yet highly challenging paradigm. Existing end-to-end learning approaches directly map high-dimensional RGB observations to action commands, which frequently suffer from low sample efficiency and severe sim-to-real gaps due to the visual discrepancy between simulation and physical domains. To address these long-standing challenges, we propose GaussFly, a novel framework that explicitly decouples representation learning from policy optimization through a cohesive real-to-sim-to-real paradigm. First, to achieve a high-fidelity real-to-sim transition, we reconstruct training scenes using 3D Gaussian Splatting (3DGS) augmented with explicit geometric constraints. Second, to ensure robust sim-to-real transfer, we leverage these photorealistic simulated environments and employ contrastive representation learning to extract compact, noise-resilient latent features from the rendered RGB images. By utilizing this pre-trained encoder to provide low-dimensional feature inputs, the computational burden on the visuomotor policy is significantly reduced while its resistance against visual noise is inherently enhanced. Extensive experiments in simulated and real-world environments demonstrate that GaussFly achieves superior sample efficiency and asymptotic performance compared to baselines. Crucially, it enables robust and zero-shot policy transfer to unseen real-world environments with complex textures, effectively bridging the sim-to-real gap.

Problem

Research questions and friction points this paper is trying to address.

visuomotor policies

sim-to-real gap

monocular vision

Autonomous Aerial Vehicles

sample efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

contrastive representation learning

visuomotor policy