Critical Appraisal of Fairness Metrics in Clinical Predictive AI

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fairness evaluation of clinical predictive AI has long suffered from conceptual ambiguity, strong threshold dependence, lack of clinical validation, and insufficient quantification of intersectionality and uncertainty. To address these gaps, we conducted a scoping review (2014–2024) across five major academic databases, systematically identifying and analyzing 62 fairness metrics from 41 studies. We propose the first three-dimensional taxonomy—structured along *performance dependence*, *output level*, and *benchmark type*—revealing severe fragmentation: only 18 metrics are healthcare-specific, and merely one is explicitly oriented toward clinical utility. Our analysis identifies three critical deficiencies: (1) absence of clinically meaningful fairness metrics; (2) inadequate uncertainty quantification; and (3) limited intersectional modeling and real-world applicability. This work establishes a theoretical foundation and practical roadmap for transitioning fairness assessment from methodological abstraction to clinical implementation.

Technology Category

Application Category

📝 Abstract
Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of"fairness"remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a"fairness metric"as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.
Problem

Research questions and friction points this paper is trying to address.

Defining and quantifying fairness in clinical predictive AI
Assessing biases in AI models using sensitive attributes
Identifying gaps in clinical validation of fairness metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scoping review of fairness metrics in AI
Classification by performance and model output
Prioritize clinically meaningful fairness metrics
🔎 Similar Papers
No similar papers found.
J
Joao Matos
Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
Ben Van Calster
Ben Van Calster
Professor of Medical Statistics, KU Leuven
Prediction modelingbiostatistics
Leo Anthony Celi
Leo Anthony Celi
Massachusetts Institute of Technology
P
Paula Dhiman
Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
Judy Wawira Gichoya
Judy Wawira Gichoya
Emory University
Health informaticsRadiologyArtificial IntelligenceGlobal HealthFAIR AI
R
Richard D. Riley
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK
Chris Russell
Chris Russell
Associate Professor, University of Oxford
Ethical Machine LearningComputer VisionOptimisationEthical AI
Sara Khalid
Sara Khalid
University of Oxford, UK
signal processingmachine learningremote monitoringbiomedical data scienceplanetary health
Gary S. Collins
Gary S. Collins
Professor of Medical Statistics, University of Birmingham
medical statisticsstatisticsbiostatisticsmachine learningmetascience