BiasICL: In-Context Learning and Demographic Biases of Vision Language Models

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies context-induced performance disparities across demographic subgroups in vision-language models (VLMs) applied to medical diagnosis tasks—specifically skin lesion malignancy prediction and pneumothorax detection. Through experiments on real-world medical imaging data, subgroup sensitivity analysis, prompt manipulation, and ablation studies controlling for baseline disease prevalence, we demonstrate that in-context learning (ICL) not only induces reliance on subgroup-specific disease base rates but also introduces systematic, base-rate-independent biases that substantially amplify inter-subgroup performance gaps. We propose, for the first time, a “subgroup-matched label distribution” prompting principle—wherein training examples in ICL demonstrations are sampled to mirror the label distribution of each target subgroup—and empirically validate its effectiveness in mitigating bias. This study provides actionable, evidence-based prompting guidelines to enhance fairness and generalizability of VLMs in clinical deployment.

Technology Category

Application Category

📝 Abstract
Vision language models (VLMs) show promise in medical diagnosis, but their performance across demographic subgroups when using in-context learning (ICL) remains poorly understood. We examine how the demographic composition of demonstration examples affects VLM performance in two medical imaging tasks: skin lesion malignancy prediction and pneumothorax detection from chest radiographs. Our analysis reveals that ICL influences model predictions through multiple mechanisms: (1) ICL allows VLMs to learn subgroup-specific disease base rates from prompts and (2) ICL leads VLMs to make predictions that perform differently across demographic groups, even after controlling for subgroup-specific disease base rates. Our empirical results inform best-practices for prompting current VLMs (specifically examining demographic subgroup performance, and matching base rates of labels to target distribution at a bulk level and within subgroups), while also suggesting next steps for improving our theoretical understanding of these models.
Problem

Research questions and friction points this paper is trying to address.

Examines demographic bias in vision language models (VLMs) using in-context learning (ICL).
Analyzes VLM performance in medical imaging tasks across demographic subgroups.
Explores how ICL affects predictions and suggests improvements for VLM prompting.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Examining demographic bias in vision language models.
Analyzing ICL impact on medical imaging tasks.
Proposing best-practices for VLM prompting strategies.
🔎 Similar Papers
No similar papers found.
S
Sonnet Xu
Stanford University, Stanford, CA
Joseph D. Janizek
Joseph D. Janizek
Stanford University
MedicineMachine LearningComputational Biology
Yixing Jiang
Yixing Jiang
Stanford
R
Roxana Daneshjou
Stanford University, Stanford, CA