Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous UAVs face significant challenges in collision avoidance under resource-constrained, onboard settings lacking heavy sensors (e.g., LiDAR or stereo cameras). Method: This paper proposes a training-free, lightweight monocular visual-inertial metric depth estimation approach. It fuses monocular RGB frames with IMU measurements to construct a sparse 3D feature map, designs multiple zero-shot scale recovery strategies, and employs monotonic spline fitting for high-accuracy absolute scale recovery. Contribution/Results: To our knowledge, this is the first method achieving real-time (15 Hz) metric depth estimation on computationally limited embedded platforms, drastically reducing reliance on labeled data and domain-specific fine-tuning. Experimental results demonstrate that the estimated depth maps robustly drive motion-primitive-based planners to enable real-time obstacle avoidance in realistic environments. The approach establishes a new paradigm for end-to-end autonomous navigation on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract
This paper presents a methodology to predict metric depth from monocular RGB images and an inertial measurement unit (IMU). To enable collision avoidance during autonomous flight, prior works either leverage heavy sensors (e.g., LiDARs or stereo cameras) or data-intensive and domain-specific fine-tuning of monocular metric depth estimation methods. In contrast, we propose several lightweight zero-shot rescaling strategies to obtain metric depth from relative depth estimates via the sparse 3D feature map created using a visual-inertial navigation system. These strategies are compared for their accuracy in diverse simulation environments. The best performing approach, which leverages monotonic spline fitting, is deployed in the real-world on a compute-constrained quadrotor. We obtain on-board metric depth estimates at 15 Hz and demonstrate successful collision avoidance after integrating the proposed method with a motion primitives-based planner.
Problem

Research questions and friction points this paper is trying to address.

Estimating metric depth from monocular images and IMU
Enabling collision avoidance for autonomous aerial navigation
Providing lightweight zero-shot depth rescaling strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular visual-inertial rescaling for depth
Zero-shot metric depth without fine-tuning
Monotonic spline fitting for accurate estimation
🔎 Similar Papers
No similar papers found.
S
Steven Yang
Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 USA
Xiaoyu Tian
Xiaoyu Tian
Chinese University of Hong Kong
Kshitij Goel
Kshitij Goel
Carnegie Mellon University
Robotics
Wennie Tabib
Wennie Tabib
Carnegie Mellon University
RoboticsActive PerceptionSLAM