Diversity Covariance-Aware Prompt Learning for Vision-Language Models

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the weak cross-modal alignment and limited generalization of vision-language models in few-shot learning. Methodologically, it introduces a covariance-aware and diversity-aware prompt learning framework: (i) it proposes an anisotropic Mahalanobis distance metric based on covariance modeling to enhance geometric representation in feature space; (ii) it designs a multi-center, multi-attribute soft prompt generation mechanism to explicitly diversify decision boundaries; and (iii) it incorporates a distribution-aware prompt optimization strategy enabling independent alignment of multiple prompts. Evaluated on 11 few-shot vision-language tasks, the framework consistently outperforms state-of-the-art prompt learning methods, demonstrating superior robustness, generalization capability, and adaptability to modality heterogeneity.

Technology Category

Application Category

📝 Abstract

Prompt tuning can further enhance the performance of visual-language models across various downstream tasks (e.g., few-shot learning), enabling them to better adapt to specific applications and needs. In this paper, we present a Diversity Covariance-Aware framework that learns distributional information from the data to enhance the few-shot ability of the prompt model. First, we propose a covariance-aware method that models the covariance relationships between visual features and uses anisotropic Mahalanobis distance, instead of the suboptimal cosine distance, to measure the similarity between two modalities. We rigorously derive and prove the validity of this modeling process. Then, we propose the diversity-aware method, which learns multiple diverse soft prompts to capture different attributes of categories and aligns them independently with visual modalities. This method achieves multi-centered covariance modeling, leading to more diverse decision boundaries. Extensive experiments on 11 datasets in various tasks demonstrate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Enhances few-shot learning in vision-language models

Models covariance relationships between visual features

Learns diverse soft prompts for multi-centered covariance modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Covariance-aware method using Mahalanobis distance

Diversity-aware method with multiple soft prompts

Multi-centered covariance modeling for diverse boundaries

🔎 Similar Papers

No similar papers found.