Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zero-shot metric depth estimation for wide-angle, fisheye, and 360° cameras—characterized by large, heterogeneous fields of view (FoV)—remains an unsolved challenge, as existing models fail to generalize across FoV domains without retraining. Method: We propose the first unified framework enabling zero-shot generalization from perspective images to arbitrary-FoV cameras without fine-tuning. Our approach introduces: (1) an FoV alignment mechanism that explicitly models lens distortion and FoV-induced geometric disparities; (2) pitch-aware equirectangular projection (ERP) online augmentation to enhance robustness of spherical representations; and (3) multi-resolution data augmentation coupled with a lightweight transfer architecture built upon perspective-pretrained models. Results: Evaluated on multiple fisheye and 360° depth benchmarks, our method achieves up to 50% improvement in δ₁ accuracy over state-of-the-art metric depth foundation models, establishing the first strong cross-FoV generalization capability for metric depth estimation.

Technology Category

Application Category

📝 Abstract
While recent depth estimation methods exhibit strong zero-shot generalization, achieving accurate metric depth across diverse camera types-particularly those with large fields of view (FoV) such as fisheye and 360-degree cameras-remains a significant challenge. This paper presents Depth Any Camera (DAC), a powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cameras with varying FoVs. The framework is designed to ensure that all existing 3D data can be leveraged, regardless of the specific camera types used in new applications. Remarkably, DAC is trained exclusively on perspective images but generalizes seamlessly to fisheye and 360-degree cameras without the need for specialized training data. DAC employs Equi-Rectangular Projection (ERP) as a unified image representation, enabling consistent processing of images with diverse FoVs. Its key components include a pitch-aware Image-to-ERP conversion for efficient online augmentation in ERP space, a FoV alignment operation to support effective training across a wide range of FoVs, and multi-resolution data augmentation to address resolution disparities between training and testing. DAC achieves state-of-the-art zero-shot metric depth estimation, improving delta-1 ($delta_1$) accuracy by up to 50% on multiple fisheye and 360-degree datasets compared to prior metric depth foundation models, demonstrating robust generalization across camera types.
Problem

Research questions and friction points this paper is trying to address.

Depth Estimation
Wide-Angle Cameras
Object Depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth Any Camera (DAC)
Equi-Rectangular Projection (ERP)
Multi-resolution Enhancement
🔎 Similar Papers
No similar papers found.