🤖 AI Summary
This work addresses fairness disparities in AI-driven medical text generation arising from societal biases across demographic groups. We systematically uncover intersectional biases—spanning race, gender, and age—in clinical language models. To mitigate these without requiring sensitive attribute annotations, we propose a group-selective optimization framework that integrates bias-aware fine-tuning with a comprehensive, multi-dimensional fairness evaluation protocol—covering model scale, dataset composition, and modality. Our method selectively enhances generation quality for underserved populations while preserving clinical fidelity. Evaluated across multiple medical text generation benchmarks, it improves average fairness metrics by 23.6% for marginalized subgroups, with no degradation in clinical relevance. Key contributions include: (1) the first systematic characterization of intersectional bias in medical text generation; and (2) a lightweight, annotation-free, and scalable fairness optimization mechanism applicable across diverse model architectures and data regimes.
📝 Abstract
Artificial intelligence (AI) systems, particularly those based on deep learning models, have increasingly achieved expert-level performance in medical applications. However, there is growing concern that such AI systems may reflect and amplify human bias, reducing the quality of their performance in historically underserved populations. The fairness issue has attracted considerable research interest in the medical imaging classification field, yet it remains understudied in the text-generation domain. In this study, we investigate the fairness problem in text generation within the medical field and observe substantial performance discrepancies across different races, sexes and age groups, including intersectional groups, various model scales and different evaluation metrics. To mitigate this fairness issue, we propose an algorithm that selectively optimizes those underserved groups to reduce bias. Our evaluations across multiple backbones, datasets and modalities demonstrate that our proposed algorithm enhances fairness in text generation without compromising overall performance.