🤖 AI Summary
Prediction models for post-spinal fusion complications often exhibit bias due to demographic disparities; existing fairness approaches either rely on explicit sensitive attributes or post-hoc correction, limiting their ability to capture clinical heterogeneity. To address this, we propose FAIR-MTL—a fairness-aware multi-task learning framework that performs data-driven task decomposition via unsupervised latent subgroup discovery (k-means), integrated with inverse-frequency weighting and regularization—without requiring access to sensitive attributes. FAIR-MTL enables fine-grained, interpretable risk stratification while ensuring equitable performance across subpopulations. Evaluated on four complication severity prediction tasks, it achieves an AUC of 0.86 and accuracy of 75%, while substantially reducing fairness gaps: ΔEO and ΔDP decrease by over 40% across gender and age groups. The framework thus advances both predictive performance and clinical fairness.
📝 Abstract
Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit substantial heterogeneity. Many existing fairness approaches rely on coarse demographic adjustments or post-hoc corrections, which fail to capture the latent structure of clinical populations and may unintentionally reinforce bias. We propose FAIR-MTL, a fairness-aware multitask learning framework designed to provide equitable and fine-grained prediction of postoperative complication severity.
Instead of relying on explicit sensitive attributes during model training, FAIR-MTL employs a data-driven subgroup inference mechanism. We extract a compact demographic embedding, and apply k-means clustering to uncover latent patient subgroups that may be differentially affected by traditional models. These inferred subgroup labels determine task routing within a shared multitask architecture. During training, subgroup imbalance is mitigated through inverse-frequency weighting, and regularization prevents overfitting to smaller groups.
Applied to postoperative complication prediction with four severity levels, FAIR-MTL achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias. For gender, the demographic parity difference decreases to 0.055 and equalized odds to 0.094; for age, these values reduce to 0.056 and 0.148, respectively. Model interpretability is ensured through SHAP and Gini importance analyses, which consistently highlight clinically meaningful predictors such as hemoglobin, hematocrit, and patient weight. Our findings show that incorporating unsupervised subgroup discovery into a multitask framework enables more equitable, interpretable, and clinically actionable predictions for surgical risk stratification.