SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum

📅 2024-12-20

🏛️ IEEE Robotics and Automation Letters

📈 Citations: 2

✨ Influential: 1

career value

220K/year

🤖 AI Summary

End-to-end visual navigation for UAVs in complex environments remains challenging under no prior map and without external localization. Method: This paper proposes a Gaussian splatting–driven “vacuum-cooking” training paradigm: leveraging FiGS—a high-fidelity simulator based on Gaussian splatting scene reconstruction—to distill an expert model predictive control (MPC) policy into a lightweight SV-Net; SV-Net fuses RGB, optical flow, and IMU modalities and incorporates a dynamic adaptive low-level controller for real-time dynamics compensation. Contribution/Results: The approach achieves zero-shot sim-to-real transfer using only onboard sensing and computation. In 105 real-world flight experiments, the system demonstrates robustness against ±30% mass variation, 40 m/s wind disturbances, 60% illumination changes, severe occlusions, and sudden human intrusions, while maintaining a real-time control frequency of 20 Hz.

Technology Category

Application Category

📝 Abstract

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

Problem

Research questions and friction points this paper is trying to address.

Develops a simulator and policy for drone visual navigation.

Achieves zero-shot sim-to-real transfer with onboard perception.

Robust to dynamic variations and environmental disturbances.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Splatting for high-fidelity drone simulation

Lightweight SV-Net for real-time drone control

Expert MPC distillation for robust policy training

🔎 Similar Papers

Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps