Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models

📅 2025-10-01
🏛️ Pattern Recognition
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models suffer from poor generalization in zero-shot tasks—particularly against unseen classes—due to two key limitations: (1) biases introduced by image augmentations are not explicitly modeled, causing prompts to overfit augmentation artifacts; and (2) the absence of semantic guidance hinders prompt focus on intrinsic visual semantics. To address this, we propose the first prompt learning framework that *decouples augmentation bias*, jointly leveraging causal intervention and contrastive learning to explicitly separate semantic features from augmentation-related ones, while embedding a learnable prompt network. Our method requires no additional annotations and achieves significant performance gains across multiple zero-shot and cross-domain benchmarks. Notably, it demonstrates superior generalization under low-data and out-of-distribution settings. This work establishes a novel paradigm for robust, semantics-aware prompt learning in vision-language models.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Decoupling augmentation bias from semantic representations in prompt learning
Improving generalization of vision-language models to unseen categories
Addressing limitations of existing prompt learning methods like CoCoOp
Innovation

Methods, ideas, or system contributions that make the work stand out.

AAPL introduces adversarial token embeddings for prompts
Decouples augmentation bias from semantic representations
Focuses prompts on visually discriminative category features
🔎 Similar Papers
No similar papers found.