UniDAC: Universal Metric Depth Estimation for Any Camera

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing monocular metric depth estimation methods exhibit limited generalization across diverse camera types such as fisheye and 360° cameras, hindering unified and accurate depth prediction. This work proposes a decoupling strategy that separates the task into relative depth prediction and spatially varying scale estimation. We introduce a lightweight depth-guided scale estimation module and a distortion-aware positional encoding, RoPE-φ, which incorporates latitude-weighted equirectangular projection (ERP) coordinates. By leveraging relative depth to guide scale map upsampling and employing distortion-aware positional encoding, our approach achieves cross-camera generalization within a single model—without requiring multi-domain training or camera-specific architectures. The method sets new state-of-the-art results across multiple datasets, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Monocular metric depth estimation (MMDE) is a core challenge in computer vision, playing a pivotal role in real-world applications that demand accurate spatial understanding. Although prior works have shown promising zero-shot performance in MMDE, they often struggle with generalization across diverse camera types, such as fisheye and $360^\circ$ cameras. Recent advances have addressed this through unified camera representations or canonical representation spaces, but they require either including large-FoV camera data during training or separately trained models for different domains. We propose UniDAC, an MMDE framework that presents universal robustness in all domains and generalizes across diverse cameras using a single model. We achieve this by decoupling metric depth estimation into relative depth prediction and spatially varying scale estimation, enabling robust performance across different domains. We propose a lightweight Depth-Guided Scale Estimation module that upsamples a coarse scale map to high resolution using the relative depth map as guidance to account for local scale variations. Furthermore, we introduce RoPE-$φ$, a distortion-aware positional embedding that respects the spatial warping in Equi-Rectangular Projections (ERP) via latitude-aware weighting. UniDAC achieves state of the art (SoTA) in cross-camera generalization by consistently outperforming prior methods across all datasets.

Problem

Research questions and friction points this paper is trying to address.

monocular metric depth estimation

cross-camera generalization

fisheye cameras

360-degree cameras

universal depth estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

universal depth estimation

cross-camera generalization

scale-aware depth

distortion-aware positional encoding

monocular metric depth

🔎 Similar Papers

A New Dataset for Monocular Depth Estimation Under Viewpoint Shifts

2024-09-26arXiv.orgCitations: 0