Visual concept ranking uncovers medical shortcuts used by large multimodal models

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the tendency of large multimodal models in medical image classification to rely on non-pathological visual shortcuts, leading to inconsistent performance across demographic subgroups and undermining clinical reliability. To tackle this issue, the authors propose Visual Concept Ranking (VCR), a method that integrates prompt engineering, feature importance analysis, and human-in-the-loop intervention experiments to systematically identify the key visual concepts underpinning model predictions. Experiments on dermoscopic images, chest X-rays, and natural images demonstrate that VCR effectively uncovers reliance on spurious visual cues, quantifies performance disparities across demographic subgroups, and validates generated hypotheses through targeted interventions. This work presents the first interpretable and verifiable auditing framework for large multimodal models in medical imaging.

Technology Category

Application Category

📝 Abstract
Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. We introduce a method for identifying important visual concepts within large multimodal models (LMMs) and use it to investigate the behaviors these models exhibit when prompted with medical tasks. We primarily focus on the task of classifying malignant skin lesions from clinical dermatology images, with supplemental experiments including both chest radiographs and natural images. After showing how LMMs display unexpected gaps in performance between different demographic subgroups when prompted with demonstrating examples, we apply our method, Visual Concept Ranking (VCR), to these models and prompts. VCR generates hypotheses related to different visual feature dependencies, which we are then able to validate with manual interventions.
Problem

Research questions and friction points this paper is trying to address.

visual concept ranking
medical shortcuts
large multimodal models
model reliability
demographic bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Concept Ranking
large multimodal models
medical shortcuts
model auditing
demographic bias
🔎 Similar Papers
No similar papers found.
Joseph D. Janizek
Joseph D. Janizek
Stanford University
MedicineMachine LearningComputational Biology
S
Sonnet Xu
Department of Computer Science, Stanford University
J
Junayd Lateef
Department of Biomedical Engineering, University of California, Berkeley
R
Roxana Daneshjou
Department of Biomedical Data Science, Department of Dermatology, Stanford University