Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models

📅 2024-12-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how data imbalance affects both predictive performance and demographic fairness—across gender, age, race, and social determinants of health—in clinical language models for ICD coding. We propose a unified “imbalance–performance–fairness” triadic analytical framework, integrating multidimensional imbalance metrics, subgroup fairness evaluations (e.g., TPR difference, equalized odds gap), and causal inference techniques, applied systematically to state-of-the-art biomedical language models. Through rigorous statistical attribution analysis, we make the novel finding that intra-majority-class feature homogeneity—not merely class-level sample imbalance—is the primary driver of fairness disparities. This insight shifts the diagnostic focus in clinical NLP from aggregate class distribution toward fine-grained representation geometry within majority groups, offering actionable levers for fairness-aware model design and intervention.

Technology Category

Application Category

📝 Abstract
Data imbalance is a fundamental challenge in applying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state-of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications.
Problem

Research questions and friction points this paper is trying to address.

Impact of data imbalance on model performance
Influence of data imbalance on demographic fairness
Feature similarity effects in ICD code prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing data imbalance effects
Exploring demographic fairness impacts
Deploying diverse performance metrics
🔎 Similar Papers
No similar papers found.
P
Precious Jones
Department of Computer Science, University of Memphis
Weisi Liu
Weisi Liu
Phd Student, University of Memphis
I
I-Chan Huang
Department of Epidemiology and Cancer Control, St Jude Children’s Research Hospital
Xiaolei Huang
Xiaolei Huang
University of Memphis
Machine LearningNatural Language ProcessingHealth InformaticsLLM for Sciences