Multiple Stochastic Prompt Tuning for Practical Cross-Domain Few Shot Learning

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses cross-domain few-shot classification under extreme domain shift—where only a few labeled samples per class are available in the target domain, no source-domain data is accessible, and all classes are entirely unseen during training. We propose MIST, a multimodal stochastic prompt tuning framework. MIST learns multiple semantically complementary, learnable prompts per class, modeling prompt weights via differentiable Gaussian distributions parameterized by both mean and variance to jointly encourage semantic diversity and robust optimization. It integrates CLIP-based prompt tuning, multi-prompt ensembling, and cross-domain feature alignment. Evaluated on four newly constructed cross-domain few-shot benchmarks, MIST significantly outperforms state-of-the-art methods, effectively mitigating few-shot overfitting and maintaining strong generalization under severe domain shifts. Notably, MIST achieves lightweight adaptation of large vision-language models without requiring episodic training or access to source-domain data—the first method to do so.

Technology Category

Application Category

📝 Abstract
In this work, we propose a practical cross-domain few-shot learning (pCDFSL) task, where a large-scale pre-trained model like CLIP can be easily deployed on a target dataset. The goal is to simultaneously classify all unseen classes under extreme domain shifts, by utilizing only a few labeled samples per class. The pCDFSL paradigm is source-free and moves beyond artificially created episodic training and testing regimes followed by existing CDFSL frameworks, making it more challenging and relevant to real-world applications. Towards that goal, we propose a novel framework, termed MIST (MultIple STochastic Prompt tuning), where multiple stochastic prompts are utilized to handle significant domain and semantic shifts. Specifically, multiple prompts are learnt for each class, effectively capturing multiple peaks in the input data. Furthermore, instead of representing the weights of the multiple prompts as point-estimates, we model them as learnable Gaussian distributions with two different strategies, encouraging an efficient exploration of the prompt parameter space, which mitigate overfitting due to the few labeled training samples. Extensive experiments and comparison with the state-of-the-art methods on four CDFSL benchmarks adapted to this setting, show the effectiveness of the proposed framework.
Problem

Research questions and friction points this paper is trying to address.

Proposes practical cross-domain few-shot learning for large-scale models
Handles extreme domain shifts with few labeled samples per class
Introduces stochastic prompt tuning to mitigate overfitting in few-shot scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple stochastic prompts handle domain shifts
Learnable Gaussian distributions model prompt weights
Captures multiple peaks in input data
🔎 Similar Papers
No similar papers found.