Reviving In-domain Fine-tuning Methods for Source-Free Cross-domain Few-shot Learning

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the performance gap in cross-domain few-shot learning under source-domain unavailability, where prompt-based fine-tuning methods for vision-language models (e.g., CLIP) significantly underperform adapter-based approaches. The study reveals that adapters such as LoRA enhance modality alignment and class separability by mitigating attention collapse in the visual [CLS] token. Building on this insight, the authors propose Semantic Probe—a plug-and-play, general-purpose attention rectification framework—that uniformly boosts the performance of both prompt-based (e.g., MaPLe) and adapter-based methods. Evaluated across four cross-domain few-shot benchmarks, Semantic Probe consistently achieves state-of-the-art results, demonstrating its effectiveness and broad applicability.

📝 Abstract

Cross-Domain Few-Shot Learning (CDFSL) aims to adapt large-scale pretrained models to specialized target domains with limited samples, yet the few-shot fine-tuning of vision-language models like CLIP remains underexplored. By establishing multiple fine-tuning baselines of CLIP for CDFSL, we find adapter-based methods (e.g., LoRA) consistently outperform prompt-based ones (e.g., MaPLe), contrary to in-domain scenarios. To make those effective in-domain methods competitive again in CDFSL, we analyze this phenomenon and discover LoRA's superiority stems from rectifying the collapsed attention of visual CLS token, enhancing modality alignment and class separation by focusing on text-related visual regions. Further, we find textual EOS token exhibit much better attention to visual samples, and CLIP's standard contrastive loss weakly constrains modality alignment. Based on these insights, we propose Semantic Probe, a plug-and-play attention rectification framework for both adapter- and prompt-based methods. Extensive experiments on four CDFSL benchmarks validate our rationale, achieving state-of-the-art performance and benefiting both fine-tuning paradigms. Codes will be released.

Problem

Research questions and friction points this paper is trying to address.

Cross-Domain Few-Shot Learning

Source-Free

Fine-tuning

Vision-Language Models

CLIP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-Free Cross-domain Few-shot Learning

CLIP Fine-tuning

Attention Rectification