vMFCoOp: Towards Equilibrium on a Unified Hyperspherical Manifold for Prompting Biomedical VLMs

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses three key bottlenecks in prompt learning for biomedical vision-language models (VLMs): semantic misalignment between LLMs and CLIP variants, poor scalability amid foundation model evolution, and the inadequacy of Euclidean optimization for modeling multimodal geometric structures. To this end, we propose a unified prompt learning framework grounded in spherical manifold geometry. Our core contributions are: (1) introducing an inverse von Mises–Fisher distribution to enforce cross-modal semantic alignment on a shared unit hypersphere; (2) designing a semantic anchor mechanism with three geometric constraints to ensure few-shot stability and cross-model scalability; and (3) integrating LLM-distilled prior-guided in-context learning with multimodal prompt tuning. Evaluated across 14 biomedical datasets, 12 imaging modalities, and 13 anatomical regions, our method significantly outperforms state-of-the-art approaches, achieving superior accuracy, generalization, and clinical applicability.

Technology Category

Application Category

📝 Abstract

Recent advances in context optimization (CoOp) guided by large language model (LLM)-distilled medical semantic priors offer a scalable alternative to manual prompt engineering and full fine-tuning for adapting biomedical CLIP-based vision-language models (VLMs). However, prompt learning in this context is challenged by semantic misalignment between LLMs and CLIP variants due to divergent training corpora and model architectures; it further lacks scalability across continuously evolving families of foundation models. More critically, pairwise multimodal alignment via conventional Euclidean-space optimization lacks the capacity to model unified representations or apply localized geometric constraints, which tends to amplify modality gaps in complex biomedical imaging and destabilize few-shot adaptation. In this work, we propose vMFCoOp, a framework that inversely estimates von Mises-Fisher (vMF) distributions on a shared Hyperspherical Manifold, aligning semantic biases between arbitrary LLMs and CLIP backbones via Unified Semantic Anchors to achieve robust biomedical prompting and superior few-shot classification. Grounded in three complementary constraints, vMFCoOp demonstrates consistent improvements across 14 medical datasets, 12 medical imaging modalities, and 13 anatomical regions, outperforming state-of-the-art methods in accuracy, generalization, and clinical applicability. This work aims to continuously expand to encompass more downstream applications, and the corresponding resources are intended to be shared through https://github.com/VinyehShaw/UniEqui.

Problem

Research questions and friction points this paper is trying to address.

Aligns semantic biases between LLMs and CLIP models for biomedical prompting

Models unified representations on a hyperspherical manifold to reduce modality gaps

Enables robust few-shot classification across diverse medical imaging modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vMF distributions on hyperspherical manifold

Aligns LLM and CLIP biases via unified anchors

Applies three constraints for robust biomedical prompting

🔎 Similar Papers

No similar papers found.