Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the performance degradation in downstream adaptation of vision-language models (VLMs) caused by severe label imbalance in pseudo-labeling. We systematically identify two root causes: **concept misalignment**—cross-modal semantic shift between vision and language modalities—and **concept confusion**—inter-class discriminability ambiguity. To tackle these, we propose a unified framework integrating **concept alignment and confusion-aware margin calibration**: (1) a contrastive learning–driven concept alignment module mitigates cross-modal semantic shift; (2) an adaptive margin calibration mechanism, grounded in confusion matrix estimation, dynamically refines decision boundaries for ambiguous samples; and (3) class-weighted pseudo-label reweighting coupled with multi-paradigm collaborative training. Evaluated across six benchmark datasets and three learning paradigms, our method significantly improves pseudo-label accuracy and class balance, achieving an average 6.29% relative gain over state-of-the-art methods. Code is publicly available.

Technology Category

Application Category

📝 Abstract
Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention. A major obstacle is that the pseudolabels generated by VLMs tend to be imbalanced, leading to inferior performance. While existing methods have explored various strategies to address this, the underlying causes of imbalance remain insufficiently investigated. To fill this gap, we delve into imbalanced pseudolabels and identify two primary contributing factors: concept mismatch and concept confusion. To mitigate these two issues, we propose a novel framework incorporating concept alignment and confusion-aware calibrated margin mechanisms. The core of our approach lies in enhancing underperforming classes and promoting balanced predictions across categories, thus mitigating imbalance. Extensive experiments on six benchmark datasets with three learning paradigms demonstrate that the proposed method effectively enhances the accuracy and balance of pseudolabels, achieving a relative improvement of 6.29% over the SoTA method. Our code is avaliable at https://anonymous.4open.science/r/CAP-C642/
Problem

Research questions and friction points this paper is trying to address.

Addressing imbalanced pseudolabels in vision-language models
Mitigating concept mismatch and confusion in pseudolabels
Improving accuracy and balance of pseudolabels across categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept alignment mitigates pseudolabel imbalance
Confusion-aware calibrated margin enhances class balance
Framework improves underperforming classes effectively
🔎 Similar Papers
Y
Yuchen Wang
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Xuefeng Bai
Xuefeng Bai
Harbin Institute of Technology (Shenzhen)
Natural language processingSemanticsDialogue
Xiucheng Li
Xiucheng Li
Harbin Institute of Technology
Spatiotemporal LearningGraph LearningAI4PDEAI4Science
W
Weili Guan
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
L
Liqiang Nie
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Xinyang Chen
Xinyang Chen
Associate Professor, Harbin Institute of Technology (Shenzhen)
machine learningmultimodal learningtransfer learning