🤖 AI Summary
Existing pretrained biomedical vision-language models exhibit limited generalization under cross-modal, few-shot, and distribution-shift scenarios, particularly suffering performance degradation when class boundaries are ambiguous or acquisition protocols differ significantly. To address this, this work proposes BioVLM, a novel framework that replaces conventional parameter fine-tuning with a dynamic prompt routing mechanism. It constructs a diverse prompt pool and dynamically selects the most discriminative prompts based on a low-entropy criterion, while integrating semantic priors from large language models—distilled via knowledge distillation—and enforcing consistency between weakly and strongly augmented views. This approach enables efficient adaptation to unseen classes and modalities without extensive fine-tuning. Evaluated across three generalization settings on 11 MedMNIST+ 2D datasets, BioVLM achieves state-of-the-art performance while maintaining lightweight training and efficient inference.
📝 Abstract
Pretrained biomedical vision-language models (VLMs) such as BioMedCLIP perform well on average but often degrade on challenging modalities where inter-class margins are small and acquisition-specific variations are pronounced, especially under few-shot supervision and when modality priors differ from pretraining corpora substantially. We propose BioVLM, a prompt-learning framework that improves cross-domain generalization without extensive backbone fine-tuning. BioVLM learns a diverse prompt bank and introduces dynamic prompt selection: for each input, it selects the most discriminative prompts via a low-entropy criterion on the predictive distribution, effectively coupling sparse few-shot evidence with rich LLM semantic priors. To strengthen this coupling, we distill high-confidence LLM-derived attributes and enforce robust knowledge transfer through strong/weak augmentation consistency. At test time, BioVLM adapts by choosing modality-appropriate prompts, enabling transfer to unseen categories and domains, while keeping training lightweight and inference efficient. On 11 MedMNIST+ 2D datasets, BioVLM achieves new state of the art across three distinct generalization settings. Codes are available at https://github.com/mainaksingha01/BioVLM.