🤖 AI Summary
This work addresses source-free active domain adaptation (SFADA), where no source-domain data is available, and only sparse human annotations are permitted in the target domain. Method: We propose DAM, a dual active learning framework that uniquely integrates multimodal supervision from vision-language (ViL) models with sparse manual labeling to construct dual supervision signals on the target domain. DAM leverages multimodal foundation models to generate high-quality pseudo-labels, introduces a bidirectional knowledge distillation mechanism to enable mutual optimization between the target model and the dual supervision, and jointly incorporates active sampling strategies with source-free domain adaptation training. Contribution/Results: DAM achieves state-of-the-art performance across multiple SFADA benchmarks, significantly outperforming existing methods. Our results empirically validate the effectiveness and generalizability of synergistically combining multimodal supervision with active learning to enhance cross-domain transfer under source-free and label-scarce conditions.
📝 Abstract
Source-free active domain adaptation (SFADA) enhances knowledge transfer from a source model to an unlabeled target domain using limited manual labels selected via active learning. While recent domain adaptation studies have introduced Vision-and-Language (ViL) models to improve pseudo-label quality or feature alignment, they often treat ViL-based and data supervision as separate sources, lacking effective fusion. To overcome this limitation, we propose Dual Active learning with Multimodal (DAM) foundation model, a novel framework that integrates multimodal supervision from a ViL model to complement sparse human annotations, thereby forming a dual supervisory signal. DAM initializes stable ViL-guided targets and employs a bidirectional distillation mechanism to foster mutual knowledge exchange between the target model and the dual supervisions during iterative adaptation. Extensive experiments demonstrate that DAM consistently outperforms existing methods and sets a new state-of-the-art across multiple SFADA benchmarks and active learning strategies.