Shapley Regression for Rare Disease Diagnosis Support: a case study on APDS

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This study addresses the diagnostic delay in Activated PI3K Delta Syndrome (APDS), a rare disease characterized by highly heterogeneous symptoms and overlapping clinical manifestations. To tackle this challenge, the authors propose a Shapley regression model that, for the first time, integrates Shapley values with k-additive cooperative game theory to explicitly model symptom co-occurrence patterns. The approach balances the expressive power of deep models with the interpretability of linear models while preserving model transparency and convexity. Leveraging electronic health records, a lightweight second-order additive Shapley regression with L2 regularization is implemented. Evaluated on eight public biomedical datasets and a real-world cohort of 222 patients, the method effectively discriminates APDS from controls, accurately recapitulating known phenotypes and uncovering novel symptom interaction pairs subsequently validated by clinical experts.
📝 Abstract
Activated PI3K8 Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records. Traditional linear scoring systems cannot capture complex symptom interactions, while deep learning models, though expressive, often lack interpretability. To bridge this gap, we propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while maintaining the transparency and convexity of logistic regression. We carry out an empirical study of our lightweight method on eight public biomedical datasets, showing that a 2-additive model with $l_{2}$ regularization achieves an optimal trade-off between predictive power and noise robustness. We also apply it to a real-world cohort of 222 patients, on which Shapley regression accurately distinguished APDS cases from matched controls, confirming and validating phenotypes known to be associated with APDS, and facilitating the exploration of pairwise interactions between symptoms, validated by clinical experts.
Problem

Research questions and friction points this paper is trying to address.

rare disease diagnosis
APDS
phenotypic patterns
electronic health records
symptom heterogeneity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley regression
k-additive cooperative game
interpretable machine learning
rare disease diagnosis
symptom interaction modeling
Safa Alsaidi
Safa Alsaidi
Inria, Inserm, Université Paris Cité, Heka
artificial intelligencemachine learning
T
Tomás Brogueira
Técnico, University of Lisbon, INESC-ID, Lisbon, Portugal
N
Nizar Mahlaoui
Necker Enfants Malades University Hospital, AP-HP, Paris, France
M
Marc Vincent
Data Science Platform, INSERM UMR1163, Imagine Institute, UPC, Paris, France
G
Guilherme Pelegrina
Mackenzie Presbyterian University, São Paulo, Brazil
N
Nicolas Garcelon
Data Science Platform, INSERM UMR1163, Imagine Institute, UPC, Paris, France
Adrien Coulet
Adrien Coulet
Inria Paris
Biomedical InformaticsBioinformaticsKnowledge discoverySemantic WebPharmacogenomics
Miguel Couceiro
Miguel Couceiro
Full Professor at IST, U.Lisboa, INESC-ID
Knowledge discoveryAnalogy based reasoningDecision makingFair and explainable models