🤖 AI Summary
Autonomous flight of low-cost UAVs in GPS- and LiDAR-denied indoor environments remains challenging. Method: This paper proposes a vision-only navigation system featuring: (1) joint semantic segmentation and monocular depth estimation for obstacle avoidance, scene exploration, and safe landing; (2) an adaptive scale factor algorithm that leverages semantic ground-plane detection and camera intrinsics to calibrate non-metric depth into accurate metric distances; and (3) a knowledge distillation framework wherein an SVM teacher model guides a lightweight U-Net student network for real-time segmentation, with end-to-end flight policy optimization. Results: The system achieves 100% task success rates across 30 real-world and 100 digital-twin flights, with a mean distance estimation error of only 14.4 cm. Even under model compression, autonomous success remains at 87.5%, significantly enhancing flight range and operational efficiency.
📝 Abstract
This paper presents a vision-only autonomous flight system for small UAVs operating in controlled indoor environments. The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations without requiring GPS or expensive sensors such as LiDAR. A key innovation is an adaptive scale factor algorithm that converts non-metric monocular depth predictions into accurate metric distance measurements by leveraging semantic ground plane detection and camera intrinsic parameters, achieving a mean distance error of 14.4 cm. The approach uses a knowledge distillation framework where a color-based Support Vector Machine (SVM) teacher generates training data for a lightweight U-Net student network (1.6M parameters) capable of real-time semantic segmentation. For more complex environments, the SVM teacher can be replaced with a state-of-the-art segmentation model. Testing was conducted in a controlled 5x4 meter laboratory environment with eight cardboard obstacles simulating urban structures. Extensive validation across 30 flight tests in a real-world environment and 100 flight tests in a digital-twin environment demonstrates that the combined segmentation and depth approach increases the distance traveled during surveillance and reduces mission time while maintaining 100% success rates. The system is further optimized through end-to-end learning, where a compact student neural network learns complete flight policies from demonstration data generated by our best-performing method, achieving an 87.5% autonomous mission success rate. This work advances practical vision-based drone navigation in structured environments, demonstrating solutions for metric depth estimation and computational efficiency challenges that enable deployment on resource-constrained platforms.