Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient out-of-distribution (OOD) generalization robustness of monocular depth estimation (MDE) foundation models, this paper proposes a weakly supervised self-training framework. The method operates without ground-truth depth labels and leverages inexpensive pairwise ordinal depth annotations for dense self-training. It introduces semantic segmentation–guided instance-level multi-scale hierarchical normalization to enhance geometric consistency across scales and instances. Furthermore, it synergistically combines LoRA-based parameter-efficient fine-tuning with weight regularization to stabilize optimization. Evaluated on OOD benchmarks—encompassing challenging variations in illumination, weather, and occlusion—the approach significantly outperforms existing zero-shot and transfer-learning methods. It achieves state-of-the-art cross-domain generalization performance on NYUv2, KITTI, and DDAD, demonstrating superior robustness and adaptability to unseen domains.

Technology Category

Application Category

📝 Abstract
The emergence of foundation models has substantially advanced zero-shot generalization in monocular depth estimation (MDE), as exemplified by the Depth Anything series. However, given access to some data from downstream tasks, a natural question arises: can the performance of these models be further improved? To this end, we propose WeSTAR, a parameter-efficient framework that performs Weakly supervised Self-Training Adaptation with Regularization, designed to enhance the robustness of MDE foundation models in unseen and diverse domains. We first adopt a dense self-training objective as the primary source of structural self-supervision. To further improve robustness, we introduce semantically-aware hierarchical normalization, which exploits instance-level segmentation maps to perform more stable and multi-scale structural normalization. Beyond dense supervision, we introduce a cost-efficient weak supervision in the form of pairwise ordinal depth annotations to further guide the adaptation process, which enforces informative ordinal constraints to mitigate local topological errors. Finally, a weight regularization loss is employed to anchor the LoRA updates, ensuring training stability and preserving the model's generalizable knowledge. Extensive experiments on both realistic and corrupted out-of-distribution datasets under diverse and challenging scenarios demonstrate that WeSTAR consistently improves generalization and achieves state-of-the-art performance across a wide range of benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing depth estimation generalization via weakly-supervised adaptation with regularization
Improving robustness of foundation models in unseen diverse domains
Mitigating local topological errors using ordinal depth constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised self-training adaptation with regularization
Semantically-aware hierarchical normalization using segmentation
Pairwise ordinal depth annotations for weak supervision
🔎 Similar Papers
No similar papers found.
Y
Yan Huang
South China University of Technology
Yongyi Su
Yongyi Su
South China University of Technology
Computer VisionMachine LearningTest-Time Adaptation
X
Xin Lin
Guangzhou University
L
Le Zhang
University of Electronic Science and Technology of China
X
Xun Xu
Institute for Infocomm Research (I2R), A*STAR