🤖 AI Summary
This work addresses the limitations of existing depth completion methods, which rely on task-specific encoders prone to overfitting and poor generalization. The authors propose CAPA, a novel framework that, for the first time, integrates parameter-efficient fine-tuning techniques—such as LoRA and Visual Prompt Tuning—with test-time optimization. By freezing a pretrained ViT-based 3D foundation model and adapting only a small subset of parameters using sparse geometric observations, CAPA enables scene-adaptive depth completion without requiring a dedicated encoder. In video settings, it further enhances temporal consistency through inter-frame parameter sharing. The approach supports model-agnostic multi-frame joint optimization and achieves state-of-the-art performance across diverse indoor and outdoor datasets under various sparsity patterns, significantly improving both accuracy and robustness in depth completion.
📝 Abstract
We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa.