🤖 AI Summary
This work addresses the challenge of navigating drones through irregular, narrow gaps in unknown environments, where conventional approaches rely on explicit geometric modeling and existing end-to-end methods suffer from limited generalization. To overcome these limitations, the authors propose a purely vision-based, end-to-end control framework that directly predicts SE(3) flight commands from a single depth image. The approach integrates differentiable simulation, a stop-gradient operator, and a bimodal initialization distribution, while further enhancing safety and stability through a traversability predictor and a gap-crossing success classifier. Experimental results demonstrate that the proposed method achieves strong generalization, high traversal efficiency, and robust real-world performance in both simulated and physical environments.
📝 Abstract
-Navigation through narrow and irregular gaps is an essential skill in autonomous drones for applications such as inspection, search-and-rescue, and disaster response. However, traditional planning and control methods rely on explicit gap extraction and measurement, while recent end-to-end approaches often assume regularly shaped gaps, leading to poor generalization and limited practicality. In this work, we present a fully vision-based, end-to-end framework that maps depth images directly to control commands, enabling drones to traverse complex gaps within unseen environments. Operating in the Special Euclidean group SE(3), where position and orientation are tightly coupled, the framework leverages differentiable simulation, a Stop-Gradient operator, and a Bimodal Initialization Distribution to achieve stable traversal through consecutive gaps. Two auxiliary prediction modules-a gap-crossing success classifier and a traversability predictor-further enhance continuous navigation and safety. Extensive simulation and real-world experiments demonstrate the approach's effectiveness, generalization capability, and practical robustness.