🤖 AI Summary
This work addresses the challenge of recovering metrically accurate dense depth maps from sparse observations under a single viewpoint by proposing an efficient Transformer-based depth completion framework. The method innovatively leverages the Poisson equation to generate a structure-aware depth initialization and introduces a point-graph head that directly regresses 3D coordinates in camera space, yielding metrically consistent 3D point maps without requiring camera intrinsics. By integrating monocular priors with an improved training objective, the framework consistently outperforms existing approaches across multiple benchmark datasets and varying sparsity levels, demonstrating superior accuracy and strong generalization capability in both depth completion and 3D point map estimation tasks.
📝 Abstract
This work presents the Large Depth Completion Model (LDCM), a simple, effective, and robust framework for single-view metric depth estimation with sparse observations. Without relying on complex architectural designs, LDCM generates metric-accurate dense depth maps using a transformer. It outperforms existing approaches across diverse datasets and sparse observations. We achieve this from two key perspectives: (1) leveraging existing monocular foundation models to improve the quality of sparse depth inputs, and (2) reformulating training objectives to better capture geometric structure and metric consistency. Specifically, a Poisson-based depth initialization strategy is first introduced to generate a uniform coarse dense depth map from diverse sparse observations, providing a strong structural prior for the network. Regarding the training objective, we replace the conventional depth head with a point map head that regresses per-pixel 3D coordinates in camera space, enabling the model to directly learn the underlying 3D scene structure instead of performing pixel-wise depth map restoration. Moreover, this design eliminates the need for camera intrinsic parameters, allowing LDCM to naturally produce metric-scaled 3D point maps. Extensive experiments demonstrate that LDCM consistently outperforms state-of-the-art methods across multiple benchmarks and varying sparsity levels in both depth completion and point map estimation, showcasing its effectiveness and strong generalization to unseen data distributions.