DAM: Dual Active Learning with Multimodal Foundation Model for Source-Free Domain Adaptation

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses source-free active domain adaptation (SFADA), where no source-domain data is available, and only sparse human annotations are permitted in the target domain. Method: We propose DAM, a dual active learning framework that uniquely integrates multimodal supervision from vision-language (ViL) models with sparse manual labeling to construct dual supervision signals on the target domain. DAM leverages multimodal foundation models to generate high-quality pseudo-labels, introduces a bidirectional knowledge distillation mechanism to enable mutual optimization between the target model and the dual supervision, and jointly incorporates active sampling strategies with source-free domain adaptation training. Contribution/Results: DAM achieves state-of-the-art performance across multiple SFADA benchmarks, significantly outperforming existing methods. Our results empirically validate the effectiveness and generalizability of synergistically combining multimodal supervision with active learning to enhance cross-domain transfer under source-free and label-scarce conditions.

Technology Category

Application Category

📝 Abstract
Source-free active domain adaptation (SFADA) enhances knowledge transfer from a source model to an unlabeled target domain using limited manual labels selected via active learning. While recent domain adaptation studies have introduced Vision-and-Language (ViL) models to improve pseudo-label quality or feature alignment, they often treat ViL-based and data supervision as separate sources, lacking effective fusion. To overcome this limitation, we propose Dual Active learning with Multimodal (DAM) foundation model, a novel framework that integrates multimodal supervision from a ViL model to complement sparse human annotations, thereby forming a dual supervisory signal. DAM initializes stable ViL-guided targets and employs a bidirectional distillation mechanism to foster mutual knowledge exchange between the target model and the dual supervisions during iterative adaptation. Extensive experiments demonstrate that DAM consistently outperforms existing methods and sets a new state-of-the-art across multiple SFADA benchmarks and active learning strategies.
Problem

Research questions and friction points this paper is trying to address.

Integrating multimodal supervision to complement sparse human annotations
Fusing Vision-and-Language models with active learning for domain adaptation
Enhancing knowledge transfer using dual supervisory signals in SFADA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multimodal supervision with human annotations
Employs bidirectional distillation for mutual knowledge exchange
Initializes stable Vision-and-Language guided targets
🔎 Similar Papers
No similar papers found.