🤖 AI Summary
In source-free domain adaptation (SFDA), large vision-language (ViL) models suffer from high noise and low reliability in pseudo-labels due to domain shift and lack of source data. Method: This paper proposes Proxy Denoising (ProDe), a novel framework that operates without access to source data. It introduces a theoretically grounded proxy confidence metric to characterize systematic biases of ViL models in implicit domain-invariant spaces; designs a learnable proxy bias modeling and correction mechanism; and incorporates mutual knowledge distillation regularization using corrected pseudo-labels. Contribution/Results: ProDe significantly improves pseudo-supervision quality and achieves state-of-the-art performance across four major SFDA settings—closed-set, open-set, partial-set, and generalized SFDA—demonstrating strong generalization and robustness.
📝 Abstract
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to an unlabeled target domain with no access to the source data. Inspired by the success of pre-trained large vision-language (ViL) models in many other applications, the latest SFDA methods have also validated the benefit of ViL models by leveraging their predictions as pseudo supervision. However, we observe that ViL's predictions could be noisy and inaccurate at an unknown rate, potentially introducing additional negative effects during adaption. To address this thus-far ignored challenge, in this paper, we introduce a novel Proxy Denoising (ProDe) approach. Specifically, we leverage the ViL model as a proxy to facilitate the adaptation process towards the latent domain-invariant space. Critically, we design a proxy denoising mechanism for correcting ViL's predictions. This is grounded on a novel proxy confidence theory by modeling elegantly the domain adaption effect of the proxy's divergence against the domain-invariant space. To capitalize the corrected proxy, we further derive a mutual knowledge distilling regularization. Extensive experiments show that our ProDe significantly outperforms the current state-of-the-art alternatives under both conventional closed-set setting and the more challenging open-set, partial-set and generalized SFDA settings. The code will release soon.