Subgroup Validity in Machine Learning for Echocardiogram Data

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Critical gaps exist in empirical evidence regarding the performance of AI models for cardiac ultrasound across sex, race, and ethnicity subgroups. Method: This study first systematically assessed the completeness of sociodemographic reporting across six publicly available echocardiography datasets—including TMED-2—and conducted cross-subgroup performance evaluation of published deep learning models for aortic stenosis detection. Results: Most datasets exhibited underrepresentation of female participants, low coverage of racial/ethnic minorities, and frequent absence of sociodemographic annotations. Deployed models demonstrated statistically significant performance disparities across subgroups and lacked formal fairness validation. Contribution: The study introduces a novel, comprehensive framework for evaluating subgroup validity in echocardiographic AI—explicitly integrating sociodemographic variables throughout model development, training, validation, and reporting. This framework establishes a methodological foundation and empirical benchmark for pre-deployment algorithmic fairness assessment in clinical AI systems.

Technology Category

Application Category

📝 Abstract
Echocardiogram datasets enable training deep learning models to automate interpretation of cardiac ultrasound, thereby expanding access to accurate readings of diagnostically-useful images. However, the gender, sex, race, and ethnicity of the patients in these datasets are underreported and subgroup-specific predictive performance is unevaluated. These reporting deficiencies raise concerns about subgroup validity that must be studied and addressed before model deployment. In this paper, we show that current open echocardiogram datasets are unable to assuage subgroup validity concerns. We improve sociodemographic reporting for two datasets: TMED-2 and MIMIC-IV-ECHO. Analysis of six open datasets reveals no consideration of gender-diverse patients and insufficient patient counts for many racial and ethnic groups. We further perform an exploratory subgroup analysis of two published aortic stenosis detection models on TMED-2. We find insufficient evidence for subgroup validity for sex, racial, and ethnic subgroups. Our findings highlight that more data for underrepresented subgroups, improved demographic reporting, and subgroup-focused analyses are needed to prove subgroup validity in future work.
Problem

Research questions and friction points this paper is trying to address.

Assess subgroup validity in echocardiogram machine learning models
Address underreported patient demographics in cardiac ultrasound datasets
Evaluate predictive performance across gender, race, and ethnic subgroups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved sociodemographic reporting in datasets
Exploratory subgroup analysis for model evaluation
Highlighted need for underrepresented subgroup data
🔎 Similar Papers
No similar papers found.
C
Cynthia Feeney
Department of Computer Science, Tufts University, Medford, MA, USA
S
Shane Williams
Department of Computer Science, Tufts University, Medford, MA, USA
B
Benjamin S. Wessler
Division of Cardiology, Tufts Medical Center, Boston, MA, USA
Michael C. Hughes
Michael C. Hughes
Assistant Professor of Computer Science, Tufts University
Machine LearningClinical Informatics