Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical machine learning models for diabetes and heart disease exhibit significant performance disparities and instability across gender and age subgroups, particularly manifesting as poor prediction consistency and low accuracy in elderly patients. The common attribution of such inequities solely to insufficient data representativeness is overly reductive. Method: We propose a novel framework integrating systemic arbitrariness analysis with conventional evaluation, explicitly linking data complexity (e.g., manifold dimensionality, class overlap), model instability, and clinical fairness. Using >25,000 real-world electronic health records, we conduct cross-dataset stability analysis and fairness sensitivity testing. Contribution/Results: Models consistently underperform on elderly and female patients; superior performance is observed in male and younger cohorts. Critically, prediction bias in older adults strongly correlates with high data complexity—challenging the “representativeness implies fairness” paradigm. Our work establishes a new theoretical lens for clinical AI fairness and delivers a reproducible, complexity-aware evaluation toolkit.

Technology Category

Application Category

📝 Abstract
Machine Learning (ML) algorithms are vital for supporting clinical decision-making in biomedical informatics. However, their predictive performance can vary across demographic groups, often due to the underrepresentation of historically marginalized populations in training datasets. The investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived ML models. Thus, a novel analytical framework is introduced, combining systematic arbitrariness with traditional metrics like accuracy and data complexity. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance. This highlights that representativeness in training data alone does not guarantee equitable outcomes, and model arbitrariness must be addressed before deploying models in clinical settings.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning Bias
Algorithm Fairness
Demographic Disparity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fairness in ML predictions
Data complexity assessment
Healthcare algorithm reliability
🔎 Similar Papers
No similar papers found.
I
Ioannis Bilionis
Adhera Health, Santa Cruz, USA; Universitat Pompeu Fabra, Barcelona, Spain
R
Ricardo C. Berrios
Adhera Health, Santa Cruz, USA
L
Luis Fernandez-Luque
Adhera Health, Santa Cruz, USA
Carlos Castillo
Carlos Castillo
ICREA Research Professor at Universitat Pompeu Fabra
Responsible ComputingAlgorithmic FairnessCrisis Informatics