Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-image metric depth estimation still suffers from poor generalization across indoor/outdoor multi-scale scenes, low near-field accuracy, and weak far-field robustness. To address these challenges, we propose a sliding anchor mechanism that dynamically adapts to scene-specific depth distributions. We further introduce the first depth decomposition framework that explicitly separates depth into normalized near-field and contracted far-field components, enabling unified modeling from 0 to ∞. Based on this formulation, we design an end-to-end differentiable depth regression network that supports anchor-driven depth decoupling representation and adaptive optimization. Our method achieves state-of-the-art accuracy on NYUv2, KITTI, and SUN RGB-D benchmarks, while significantly improving cross-dataset generalization. Notably, it reduces absolute near-field error (≤0.5 m) and relative far-field error (≥50 m), demonstrating superior performance in both critical regimes.

Technology Category

Application Category

📝 Abstract
Accurate and generalizable metric depth estimation is crucial for various computer vision applications but remains challenging due to the diverse depth scales encountered in indoor and outdoor environments. In this paper, we introduce Metric-Solver, a novel sliding anchor-based metric depth estimation method that dynamically adapts to varying scene scales. Our approach leverages an anchor-based representation, where a reference depth serves as an anchor to separate and normalize the scene depth into two components: scaled near-field depth and tapered far-field depth. The anchor acts as a normalization factor, enabling the near-field depth to be normalized within a consistent range while mapping far-field depth smoothly toward zero. Through this approach, any depth from zero to infinity in the scene can be represented within a unified representation, effectively eliminating the need to manually account for scene scale variations. More importantly, for the same scene, the anchor can slide along the depth axis, dynamically adjusting to different depth scales. A smaller anchor provides higher resolution in the near-field, improving depth precision for closer objects while a larger anchor improves depth estimation in far regions. This adaptability enables the model to handle depth predictions at varying distances and ensure strong generalization across datasets. Our design enables a unified and adaptive depth representation across diverse environments. Extensive experiments demonstrate that Metric-Solver outperforms existing methods in both accuracy and cross-dataset generalization.
Problem

Research questions and friction points this paper is trying to address.

Dynamic metric depth estimation across varying scene scales
Unified depth representation for near and far fields
Adaptive anchor sliding for improved depth precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sliding anchor-based metric depth estimation
Dynamic adaptation to varying scene scales
Unified depth representation for all distances
🔎 Similar Papers
No similar papers found.