Worst-Group Equalized Odds Regularization for Multi-Attribute Fair Medical Image Classification

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This work addresses the systematic performance disparities of medical AI across demographic subgroups—such as age, sex, and race—which are often obscured by aggregate metrics like AUC that fail to capture fairness under a fixed decision threshold. To tackle this, the authors propose a worst-subgroup equalized odds regularization method that dynamically identifies and constrains the subgroup with the largest deviation in true positive and false positive rates, thereby optimizing fairness across multiple attributes without requiring explicit definition of intersecting subgroups. The approach is designed for multi-label medical image classification and demonstrates significant reductions in both equalized odds and equal opportunity gaps on two real-world datasets, while preserving overall model performance as measured by AUC, thus achieving a strong balance between accuracy and group fairness.
📝 Abstract
Diagnostic performance in medical AI varies systematically across demographic groups, yet subgroup AUC can mask clinically important disparities. At a fixed inference-time operating point, some groups may exhibit over-diagnostic behaviour, characterized by elevated true and false positive rates, while others show under-diagnostic patterns with reduced true and false positive rates. These opposing tendencies can cancel in aggregate AUCs while producing meaningful inequities in clinical decision-making. Motivated by the need to assess and mitigate such disparities at the operating point and across multiple demographic attributes simultaneously, we propose a worst-group equalized-odds margin regularizer. The proposed regularizer explicitly targets subgroup-level deviations on both the true positive and false positive sides at inference. At each update, the method identifies subgroups defined by explicit demographic attributes (e.g., age, sex, and race) that exhibit the most extreme margin deviations and applies a unified penalty, enabling fairness optimization across multiple demographic axes without requiring explicit intersectional constraints. Across two medical imaging datasets in realistic multi-label settings, our method consistently reduces disparities in Equalized Odds and Equalized Opportunity with minimal impact on AUC, preserving diagnostic performance while improving fairness.
Problem

Research questions and friction points this paper is trying to address.

fairness
medical image classification
equalized odds
demographic disparities
operating point
Innovation

Methods, ideas, or system contributions that make the work stand out.

worst-group fairness
equalized odds
medical image classification
multi-attribute fairness
margin regularization
N
Nikhil Cherian Kurian
Australian Institute for Machine Learning, Adelaide University, Adelaide, Australia
V
Victor Caquilpan Parra
Australian Institute for Machine Learning, Adelaide University, Adelaide, Australia
A
Abin Shoby
Australian Institute for Machine Learning, Adelaide University, Adelaide, Australia
L
Luke Whitbread
Australian Institute for Machine Learning, Adelaide University, Adelaide, Australia
Lauren Oakden-Rayner
Lauren Oakden-Rayner
Australian Institute for Machine Learning. University of Adelaide. Royal Adelaide Hospital.
RadiologyImage AnalysisMachine LearningDeep LearningMedical Informatics
R
Robert Vandersluis
GlaxoSmithKline (GSK)
Jessica Schrouff
Jessica Schrouff
DeepMind
Machine LearningDeep LearningHealthSignal ProcessingMedical Imaging
L
Lyle J. Palmer
Australian Institute for Machine Learning, Adelaide University, Adelaide, Australia
Mark Jenkinson
Mark Jenkinson
Professor of Neuroimaging
medical image analysisneuroimagingdeep learning