Semi-Supervised Few-Shot Adaptation of Vision-Language Models

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the challenge of medical image classification under extremely limited and imbalanced labeled data, where existing vision-language models suffer from performance degradation due to high annotation costs. The authors propose a novel semi-supervised few-shot adaptation method that, for the first time, integrates a text-guided pseudo-label propagation mechanism into pre-trained vision-language models. By leveraging a multimodal linear probe, the approach effectively utilizes unlabeled data to generate high-quality pseudo-labels. This strategy substantially reduces reliance on annotated samples, successfully mitigates class imbalance, and enhances overall model performance—even when using less than half of the original labeled data.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) pre-trained on large, heterogeneous data sources are becoming increasingly popular, providing rich multi-modal embeddings that enable efficient transfer to new tasks. A particularly relevant application is few-shot adaptation, where only a handful of annotated examples are available to adapt the model through multi-modal linear probes. In medical imaging, specialized VLMs have shown promising performance in zero- and few-shot image classification, which is valuable for mitigating the high cost of expert annotations. However, challenges remain in extremely low-shot regimes: the inherent class imbalances in medical tasks often lead to underrepresented categories, penalizing overall model performance. To address this limitation, we propose leveraging unlabeled data by introducing an efficient semi-supervised solver that propagates text-informed pseudo-labels during few-shot adaptation. The proposed method enables lower-budget annotation pipelines for adapting VLMs, reducing labeling effort by >50% in low-shot regimes.

Problem

Research questions and friction points this paper is trying to address.

few-shot adaptation

class imbalance

medical imaging

vision-language models

semi-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

semi-supervised learning

few-shot adaptation

vision-language models