CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual bottlenecks of scarce labeled data and poor model interpretability in medical image classification, this paper proposes CBVLM—a training-free, concept-based vision-language model framework. CBVLM employs a two-stage concept-guided prompting strategy coupled with retrieval-augmented in-context learning, explicitly leveraging human-understandable medical concepts as interpretable reasoning intermediaries to enable zero-shot and few-shot explainable diagnosis. Its core innovations include: (i) the first training-free, concept-driven interpretability paradigm for LVLMs; and (ii) dynamic concept extensibility without retraining. Extensive experiments across four medical imaging datasets and twelve LVLM backbones demonstrate that CBVLM significantly outperforms existing concept bottleneck models and supervised methods, achieving state-of-the-art accuracy with only minimal labeled examples—while simultaneously delivering clinically meaningful, concept-level explanations.

Technology Category

Application Category

📝 Abstract
The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.
Problem

Research questions and friction points this paper is trying to address.

Data Scarcity
Model Interpretability
Concept Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

CBVLM
Concept-based Large Model
Medical Image Analysis
🔎 Similar Papers
No similar papers found.