Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This paper addresses the performance degradation in downstream adaptation of vision-language models (VLMs) caused by severe label imbalance in pseudo-labeling. We systematically identify two root causes: **concept misalignment**—cross-modal semantic shift between vision and language modalities—and **concept confusion**—inter-class discriminability ambiguity. To tackle these, we propose a unified framework integrating **concept alignment and confusion-aware margin calibration**: (1) a contrastive learning–driven concept alignment module mitigates cross-modal semantic shift; (2) an adaptive margin calibration mechanism, grounded in confusion matrix estimation, dynamically refines decision boundaries for ambiguous samples; and (3) class-weighted pseudo-label reweighting coupled with multi-paradigm collaborative training. Evaluated across six benchmark datasets and three learning paradigms, our method significantly improves pseudo-label accuracy and class balance, achieving an average 6.29% relative gain over state-of-the-art methods. Code is publicly available.

Technology Category

Application Category

📝 Abstract

Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention. A major obstacle is that the pseudolabels generated by VLMs tend to be imbalanced, leading to inferior performance. While existing methods have explored various strategies to address this, the underlying causes of imbalance remain insufficiently investigated. To fill this gap, we delve into imbalanced pseudolabels and identify two primary contributing factors: concept mismatch and concept confusion. To mitigate these two issues, we propose a novel framework incorporating concept alignment and confusion-aware calibrated margin mechanisms. The core of our approach lies in enhancing underperforming classes and promoting balanced predictions across categories, thus mitigating imbalance. Extensive experiments on six benchmark datasets with three learning paradigms demonstrate that the proposed method effectively enhances the accuracy and balance of pseudolabels, achieving a relative improvement of 6.29% over the SoTA method. Our code is avaliable at https://anonymous.4open.science/r/CAP-C642/

Problem

Research questions and friction points this paper is trying to address.

Addressing imbalanced pseudolabels in vision-language models

Mitigating concept mismatch and confusion in pseudolabels

Improving accuracy and balance of pseudolabels across categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept alignment mitigates pseudolabel imbalance

Confusion-aware calibrated margin enhances class balance

Framework improves underperforming classes effectively

🔎 Similar Papers

Tuning Vision-Language Models with Candidate Labels by Prompt Alignment