Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Autonomous UAVs face significant challenges in collision avoidance under resource-constrained, onboard settings lacking heavy sensors (e.g., LiDAR or stereo cameras). Method: This paper proposes a training-free, lightweight monocular visual-inertial metric depth estimation approach. It fuses monocular RGB frames with IMU measurements to construct a sparse 3D feature map, designs multiple zero-shot scale recovery strategies, and employs monotonic spline fitting for high-accuracy absolute scale recovery. Contribution/Results: To our knowledge, this is the first method achieving real-time (15 Hz) metric depth estimation on computationally limited embedded platforms, drastically reducing reliance on labeled data and domain-specific fine-tuning. Experimental results demonstrate that the estimated depth maps robustly drive motion-primitive-based planners to enable real-time obstacle avoidance in realistic environments. The approach establishes a new paradigm for end-to-end autonomous navigation on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract

This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor. We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.

Problem

Research questions and friction points this paper is trying to address.

Estimating metric depth from monocular images and IMU

Enabling collision avoidance for autonomous aerial navigation

Providing lightweight zero-shot depth rescaling strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular visual-inertial rescaling for depth

Zero-shot metric depth without fine-tuning

Monotonic spline fitting for accurate estimation

🔎 Similar Papers

TanDepth: Leveraging Global DEMs for Metric Monocular Depth Estimation in UAVs