🤖 AI Summary
This work addresses fairness risks in multimodal large language models (MLLMs) arising from performance disparities across demographic groups in high-stakes clinical settings. To mitigate these inequities, the authors propose a parameter-efficient fair fine-tuning approach that integrates low-rank adaptation (LoRA) with mutual information minimization during visual instruction tuning, thereby learning representations invariant to demographic attributes. This method is the first to incorporate fairness-aware regularization into the LoRA framework, enabling an architecture-agnostic, plug-and-play solution for fair visual instruction following. Experimental results on chest X-ray report generation and dermoscopic visual question answering demonstrate that the proposed approach significantly reduces inter-group performance gaps while simultaneously improving fairness-weighted clinical performance and generation quality.
📝 Abstract
While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-making. While fairness has been studied extensively in vision-only and language-only models, its impact on MLLMs remains largely underexplored. To address these biases, we introduce FairLLaVA, a parameter-efficient fine-tuning method that mitigates group disparities in visual instruction tuning without compromising overall performance. By minimizing the mutual information between target attributes, FairLLaVA regularizes the model's representations to be demographic-invariant. The method can be incorporated as a lightweight plug-in, maintaining efficiency with low-rank adapter fine-tuning, and provides an architecture-agnostic approach to fair visual instruction following. Extensive experiments on large-scale chest radiology report generation and dermoscopy visual question answering benchmarks show that FairLLaVA consistently reduces inter-group disparities while improving both equity-scaled clinical performance and natural language generation quality across diverse medical imaging modalities. Code can be accessed at https://github.com/bhosalems/FairLLaVA.