Medical Knowledge Intervention Prompt Tuning for Medical Image Classification

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing prompt-tuning methods struggle to accurately model disease-specific features in medical image classification, particularly overlooking imaging modality-specific characteristics in cross-modal settings. To address this, we propose a medical-knowledge-driven conditional prompt-tuning framework. Our method is the first to inject structured disease representations—encoded by large language models (LLMs)—into the prompt generation process of vision-language models (VLMs) via a low-rank linear subspace intervention mechanism. Additionally, we introduce an instance-adaptive conditioning mechanism to enable dynamic, single-image-level prompt construction. This design significantly enhances fine-grained discrimination of medical concepts. Extensive experiments on multiple public medical image benchmarks demonstrate that our approach consistently outperforms state-of-the-art prompt-tuning methods, validating the substantial performance gains achieved by integrating external medical priors into lightweight tuning strategies.

Technology Category

Application Category

📝 Abstract

Vision-language foundation models (VLMs) have shown great potential in feature transfer and generalization across a wide spectrum of medical-related downstream tasks. However, fine-tuning these models is resource-intensive due to their large number of parameters. Prompt tuning has emerged as a viable solution to mitigate memory usage and reduce training time while maintaining competitive performance. Nevertheless, the challenge is that existing prompt tuning methods cannot precisely distinguish different kinds of medical concepts, which miss essentially specific disease-related features across various medical imaging modalities in medical image classification tasks. We find that Large Language Models (LLMs), trained on extensive text corpora, are particularly adept at providing this specialized medical knowledge. Motivated by this, we propose incorporating LLMs into the prompt tuning process. Specifically, we introduce the CILMP, Conditional Intervention of Large Language Models for Prompt Tuning, a method that bridges LLMs and VLMs to facilitate the transfer of medical knowledge into VLM prompts. CILMP extracts disease-specific representations from LLMs, intervenes within a low-rank linear subspace, and utilizes them to create disease-specific prompts. Additionally, a conditional mechanism is incorporated to condition the intervention process on each individual medical image, generating instance-adaptive prompts and thus enhancing adaptability. Extensive experiments across diverse medical image datasets demonstrate that CILMP consistently outperforms state-of-the-art prompt tuning methods, demonstrating its effectiveness. Code is available at https://github.com/usr922/cilmp.

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning vision-language models is resource-intensive for medical tasks

Existing prompt tuning methods miss disease-specific features in medical images

Medical knowledge from LLMs is not effectively transferred to VLM prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to extract disease-specific medical knowledge

Intervenes in low-rank subspace for prompt creation

Generates instance-adaptive prompts with conditional mechanism

🔎 Similar Papers

No similar papers found.

Authors to Follow