Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

📅 2024-12-18

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To resolve the accuracy–cost trade-off between low-cost LiDAR and high-resolution metric depth estimation, this paper introduces Prompt Depth Anything—a novel paradigm that leverages sparse, low-accuracy LiDAR point clouds as multi-scale geometric prompts to guide the Depth Anything foundation model toward 4K-resolution metric depth prediction. Methodologically, we design a lightweight prompt fusion architecture enabling cross-scale feature alignment and develop a scalable data pipeline integrating LiDAR physics simulation with pseudo-ground-truth generation from real-world scenes. Evaluated on ARKitScenes and ScanNet++, our approach achieves state-of-the-art performance, reducing 4K depth error by 21.3% relatively. Moreover, the high-fidelity depth maps substantially enhance downstream applications, including photorealistic 3D reconstruction and general-purpose robotic grasping.

Technology Category

Application Category

📝 Abstract

Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containing both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Problem

Research questions and friction points this paper is trying to address.

Enhancing metric depth estimation using LiDAR prompts

Achieving high-resolution 4K depth output with Depth Anything

Overcoming limited datasets via synthetic and pseudo GT data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Depth Anything with LiDAR guidance

Multi-scale LiDAR fusion in decoder

Synthetic and real data pipeline

🔎 Similar Papers

No similar papers found.