🤖 AI Summary
Autonomous UAVs face significant challenges in collision avoidance under resource-constrained, onboard settings lacking heavy sensors (e.g., LiDAR or stereo cameras).
Method: This paper proposes a training-free, lightweight monocular visual-inertial metric depth estimation approach. It fuses monocular RGB frames with IMU measurements to construct a sparse 3D feature map, designs multiple zero-shot scale recovery strategies, and employs monotonic spline fitting for high-accuracy absolute scale recovery.
Contribution/Results: To our knowledge, this is the first method achieving real-time (15 Hz) metric depth estimation on computationally limited embedded platforms, drastically reducing reliance on labeled data and domain-specific fine-tuning. Experimental results demonstrate that the estimated depth maps robustly drive motion-primitive-based planners to enable real-time obstacle avoidance in realistic environments. The approach establishes a new paradigm for end-to-end autonomous navigation on resource-constrained platforms.
📝 Abstract
This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor. We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.