AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models suffer from two types of matching bias in multi-prompt learning: (i) model–prompt bias—semantic inconsistency of identical prompts across different models; and (ii) sample–prompt bias—irrelevant semantic content in input samples interfering with weight computation. To address this, we propose an adaptive debiased ensemble framework—the first to systematically identify and jointly mitigate both biases. Our method introduces an information-theoretic prompt-relevant semantic extraction mechanism, leveraging mutual information for dynamic, bias-aware ensemble weight learning. It further integrates cross-model collaborative inference (CLIP-ViT-B/16 & B/32) with causal validation. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three few-shot benchmarks: novel-class generalization, cross-dataset transfer, and unknown-domain shift. We further provide theoretical guarantees of effectiveness grounded in causal reasoning.

Technology Category

Application Category

📝 Abstract
Multi-prompt learning methods have emerged as an effective approach for facilitating the rapid adaptation of vision-language models to downstream tasks with limited resources. Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model to achieve superior performance. However, the overlooked model-prompt matching bias hinders the development of multi-prompt learning, i.e., the same prompt can convey different semantics across distinct vision-language models, such as CLIP-ViT-B/16 and CLIP-ViT-B/32, resulting in inconsistent predictions of identical prompt. To mitigate the impact of this bias on downstream tasks, we explore an ensemble learning approach to sufficiently aggregate the benefits of diverse predictions. Additionally, we further disclose the presence of sample-prompt matching bias, which originates from the prompt-irrelevant semantics encapsulated in the input samples. Thus, directly utilizing all information from the input samples for generating weights of ensemble learning can lead to suboptimal performance. In response, we extract prompt-relevant semantics from input samples by leveraging the guidance of the information theory-based analysis, adaptively calculating debiased ensemble weights. Overall, we propose Adaptive-Debiased Ensemble MultiPrompt Learning, abbreviated as AmPLe, to mitigate the two types of bias simultaneously. Extensive experiments on three representative tasks, i.e., generalization to novel classes, new target datasets, and unseen domain shifts, show that AmPLe can widely outperform existing methods. Theoretical validation from a causal perspective further supports the effectiveness of AmPLe.
Problem

Research questions and friction points this paper is trying to address.

Mitigates model-prompt matching bias in vision-language models
Addresses sample-prompt matching bias from irrelevant semantics
Enhances multi-prompt learning via adaptive debiased ensemble weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive-debiased ensemble learning for multi-prompt integration
Extracting prompt-relevant semantics using information theory guidance
Mitigating model-prompt and sample-prompt matching biases simultaneously
🔎 Similar Papers
No similar papers found.
F
Fei Song
National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Y
Yi Li
National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Jiangmeng Li
Jiangmeng Li
Institute of Software, Chinese Academy of Science
Multi-modal learningSelf-supervised learningDomain generalizationCausal learning
R
Rui Wang
National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Changwen Zheng
Changwen Zheng
中国科学院软件研究所
机器学习、计算机仿真
F
Fanjiang Xu
National Key Laboratory of Space Integrated Information System, Institute of Software, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser