FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing fairness tools are often limited to single demographic attributes and struggle to capture the compounded biases faced by intersecting groups—such as combinations of race and gender—in clinical machine learning. This work proposes a Python toolkit that extends observational fairness metrics, including demographic parity and equalized odds, to intersectional subgroups for the first time, while integrating two counterfactual fairness frameworks to evaluate intervention-based equity. Applied to electronic health record data using logistic regression in a glaucoma surgery prediction task, the approach uncovers substantial intersectional unfairness, with a demographic parity gap as high as 0.20. Crucially, disparities identified through intersectional analysis markedly exceed those detected by single-dimension assessments, underscoring the necessity and efficacy of this method for auditing fairness in clinical algorithms.

Technology Category

Application Category

📝 Abstract

Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed to operationalize intersectional fairness assessment in observational and counterfactual contexts within clinical settings. Methods: Fairlogue is a Python-based toolkit composed of three components: 1) an observational framework extending demographic parity, equalized odds, and equal opportunity difference to intersectional populations; 2) a counterfactual framework evaluating fairness under treatment-based contexts; and 3) a generalized counterfactual framework assessing fairness under interventions on intersectional group membership. The toolkit was evaluated using electronic health record data from the All of Us Controlled Tier V8 dataset in a glaucoma surgery prediction task using logistic regression with race and gender as protected attributes. Results: Observational analysis identified substantial intersectional disparities despite moderate model performance (AUROC = 0.709; accuracy = 0.651). Intersectional evaluation revealed larger fairness gaps than single-axis analyses, including demographic parity differences of 0.20 and equalized odds true positive and false positive rate gaps of 0.33 and 0.15, respectively. Counterfactual analysis using permutation-based null distributions produced unfairness ("u-value") estimates near zero, suggesting observed disparities were consistent with chance after conditioning on covariates. Conclusion: Fairlogue provides a modular toolkit integrating observational and counterfactual methods for quantifying and evaluating intersectional bias in clinical machine learning workflows.

Problem

Research questions and friction points this paper is trying to address.

intersectional fairness

algorithmic bias

clinical machine learning

healthcare disparities

fairness evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

intersectional fairness

counterfactual fairness

clinical machine learning