Critical Appraisal of Fairness Metrics in Clinical Predictive AI

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Fairness evaluation of clinical predictive AI has long suffered from conceptual ambiguity, strong threshold dependence, lack of clinical validation, and insufficient quantification of intersectionality and uncertainty. To address these gaps, we conducted a scoping review (2014–2024) across five major academic databases, systematically identifying and analyzing 62 fairness metrics from 41 studies. We propose the first three-dimensional taxonomy—structured along *performance dependence*, *output level*, and *benchmark type*—revealing severe fragmentation: only 18 metrics are healthcare-specific, and merely one is explicitly oriented toward clinical utility. Our analysis identifies three critical deficiencies: (1) absence of clinically meaningful fairness metrics; (2) inadequate uncertainty quantification; and (3) limited intersectional modeling and real-world applicability. This work establishes a theoretical foundation and practical roadmap for transitioning fairness assessment from methodological abstraction to clinical implementation.

Technology Category

Application Category

📝 Abstract

Predictive artificial intelligence (AI) offers an opportunity to improve clinical practice and patient outcomes, but risks perpetuating biases if fairness is inadequately addressed. However, the definition of"fairness"remains unclear. We conducted a scoping review to identify and critically appraise fairness metrics for clinical predictive AI. We defined a"fairness metric"as a measure quantifying whether a model discriminates (societally) against individuals or groups defined by sensitive attributes. We searched five databases (2014-2024), screening 820 records, to include 41 studies, and extracted 62 fairness metrics. Metrics were classified by performance-dependency, model output level, and base performance metric, revealing a fragmented landscape with limited clinical validation and overreliance on threshold-dependent measures. Eighteen metrics were explicitly developed for healthcare, including only one clinical utility metric. Our findings highlight conceptual challenges in defining and quantifying fairness and identify gaps in uncertainty quantification, intersectionality, and real-world applicability. Future work should prioritise clinically meaningful metrics.

Problem

Research questions and friction points this paper is trying to address.

Defining and quantifying fairness in clinical predictive AI

Assessing biases in AI models using sensitive attributes

Identifying gaps in clinical validation of fairness metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scoping review of fairness metrics in AI

Classification by performance and model output

Prioritize clinically meaningful fairness metrics

🔎 Similar Papers

What is Fair? Defining Fairness in Machine Learning for Health