AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning

📅 2025-12-20

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Current vision-language models suffer from two types of matching bias in multi-prompt learning: (i) model–prompt bias—semantic inconsistency of identical prompts across different models; and (ii) sample–prompt bias—irrelevant semantic content in input samples interfering with weight computation. To address this, we propose an adaptive debiased ensemble framework—the first to systematically identify and jointly mitigate both biases. Our method introduces an information-theoretic prompt-relevant semantic extraction mechanism, leveraging mutual information for dynamic, bias-aware ensemble weight learning. It further integrates cross-model collaborative inference (CLIP-ViT-B/16 & B/32) with causal validation. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three few-shot benchmarks: novel-class generalization, cross-dataset transfer, and unknown-domain shift. We further provide theoretical guarantees of effectiveness grounded in causal reasoning.

Technology Category

Application Category

📝 Abstract

Multi-prompt learning methods have emerged as an effective approach for facilitating the rapid adaptation of vision-language models to downstream tasks with limited resources. Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model to achieve superior performance. However, the overlooked model-prompt matching bias hinders the development of multi-prompt learning, i.e., the same prompt can convey different semantics across distinct vision-language models, such as CLIP-ViT-B/16 and CLIP-ViT-B/32, resulting in inconsistent predictions of identical prompt. To mitigate the impact of this bias on downstream tasks, we explore an ensemble learning approach to sufficiently aggregate the benefits of diverse predictions. Additionally, we further disclose the presence of sample-prompt matching bias, which originates from the prompt-irrelevant semantics encapsulated in the input samples. Thus, directly utilizing all information from the input samples for generating weights of ensemble learning can lead to suboptimal performance. In response, we extract prompt-relevant semantics from input samples by leveraging the guidance of the information theory-based analysis, adaptively calculating debiased ensemble weights. Overall, we propose Adaptive-Debiased Ensemble MultiPrompt Learning, abbreviated as AmPLe, to mitigate the two types of bias simultaneously. Extensive experiments on three representative tasks, i.e., generalization to novel classes, new target datasets, and unseen domain shifts, show that AmPLe can widely outperform existing methods. Theoretical validation from a causal perspective further supports the effectiveness of AmPLe.

Problem

Research questions and friction points this paper is trying to address.

Mitigates model-prompt matching bias in vision-language models

Addresses sample-prompt matching bias from irrelevant semantics

Enhances multi-prompt learning via adaptive debiased ensemble weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive-debiased ensemble learning for multi-prompt integration

Extracting prompt-relevant semantics using information theory guidance

Mitigating model-prompt and sample-prompt matching biases simultaneously

🔎 Similar Papers

No similar papers found.