π€ AI Summary
This work addresses the challenges of metric scale ambiguity, local inconsistencies, and computational inefficiency in relative depth estimation by introducing an interpretable mathematical framework capable of achieving high-accuracy metric depth estimation from extremely sparse 3D inputs. The proposed method integrates piecewise depth recovery with a discontinuity-aware geodesic cost function and refines predictions at the pixel level through a lightweight, plug-and-play architecture. This design substantially enhances global consistency and generalization performance, outperforming state-of-the-art approaches across multiple depth completion and estimation benchmarks. Moreover, the framework demonstrates strong potential for efficient deployment and broad applicability to diverse downstream 3D vision tasks.
π Abstract
Recent advances have markedly improved the cross-scene generalization of relative depth estimation, yet its practical applicability remains limited by the absence of metric scale, local inconsistencies, and low computational efficiency. To address these issues, we present \emph{\textbf{M}idas \textbf{T}ouch for \textbf{D}epth} (MTD), a mathematically interpretable approach that converts relative depth into metric depth using only extremely sparse 3D data. To eliminate local scale inconsistencies, it applies a segment-wise recovery strategy via sparse graph optimization, followed by a pixel-wise refinement strategy using a discontinuity-aware geodesic cost. MTD exhibits strong generalization and achieves substantial accuracy improvements over previous depth completion and depth estimation methods. Moreover, its lightweight, plug-and-play design facilitates deployment and integration on diverse downstream 3D tasks. Project page is available at https://mias.group/MTD.