🤖 AI Summary
Monocular depth estimation faces significant challenges under adverse weather conditions—such as rain, fog, snow, and nighttime—due to the scarcity of ground-truth annotations and severe domain shift. To address this, we propose an effective-rank-guided parameter-efficient fine-tuning (PEFT) method that achieves multi-weather generalization using only a small set of high-visibility images. Our core contribution is the Selection–Tuning–Maintenance (STM) strategy: it decomposes visual foundation model weights via entropy rank and stable rank, adaptively selects task-relevant singular directions, and enforces principal-direction regularization—marking the first extension of rank-aware fine-tuning to geometry-intensive dense prediction tasks. Evaluated on four real-world multi-weather benchmarks, our approach consistently outperforms existing PEFT and full fine-tuning methods, surpasses models trained on synthetic adverse-weather data, and even exceeds dedicated depth foundation models in performance.
📝 Abstract
Monocular depth estimation under adverse weather conditions (e.g. rain, fog, snow, and nighttime) remains highly challenging due to the lack of reliable ground truth and the difficulty of learning from unlabeled real-world data. Existing methods often rely on synthetic adverse data with pseudo-labels, which suffer from domain gaps, or employ self-supervised learning, which violates photometric assumptions in adverse scenarios. In this work, we propose to achieve weather--generalized depth estimation by Parameter--Efficient Fine--Tuning (PEFT) of Vision Foundation Models (VFMs), using only a small amount of high--visibility (normal) data. While PEFT has shown strong performance in semantic tasks such as segmentation, it remains underexplored for geometry--centric tasks like depth estimation -- especially in terms of balancing effective adaptation with the preservation of pretrained knowledge. To this end, we introduce the Selecting--Tuning--Maintaining (STM) strategy, which structurally decomposes the pretrained weights of VFMs based on two kinds of effective ranks (entropy--rank and stable--rank). In the tuning phase, we adaptively select the proper rank number as well as the task--aware singular directions for initialization, based on the entropy--rank and full--tuned weight; while in the maintaining stage, we enforce a principal direction regularization based on the stable--rank. This design guarantees flexible task adaptation while preserving the strong generalization capability of the pretrained VFM. Extensive experiments on four real--world benchmarks across diverse weather conditions demonstrate that STM not only outperforms existing PEFT methods and full fine--tuning but also surpasses methods trained with adverse synthetic data, and even the depth foundation model