🤖 AI Summary
This study addresses the diagnostic delay in Activated PI3K Delta Syndrome (APDS), a rare disease characterized by highly heterogeneous symptoms and overlapping clinical manifestations. To tackle this challenge, the authors propose a Shapley regression model that, for the first time, integrates Shapley values with k-additive cooperative game theory to explicitly model symptom co-occurrence patterns. The approach balances the expressive power of deep models with the interpretability of linear models while preserving model transparency and convexity. Leveraging electronic health records, a lightweight second-order additive Shapley regression with L2 regularization is implemented. Evaluated on eight public biomedical datasets and a real-world cohort of 222 patients, the method effectively discriminates APDS from controls, accurately recapitulating known phenotypes and uncovering novel symptom interaction pairs subsequently validated by clinical experts.
📝 Abstract
Activated PI3K8 Syndrome (APDS) is a rare genetic immune disorder caused by variants in PIK3CD or PIK3R1, with highly heterogeneous symptoms that often delay diagnosis. Early recognition is hampered by overlapping clinical presentations and limited clinician awareness, motivating systematic, data-driven approaches to detect APDS-associated phenotypic patterns in routine electronic health records. Traditional linear scoring systems cannot capture complex symptom interactions, while deep learning models, though expressive, often lack interpretability. To bridge this gap, we propose Shapley regression, a novel game-theoretic model replacing the linear predictor with a k-additive cooperative game, explicitly modeling co-occurrence of symptoms while maintaining the transparency and convexity of logistic regression. We carry out an empirical study of our lightweight method on eight public biomedical datasets, showing that a 2-additive model with $l_{2}$ regularization achieves an optimal trade-off between predictive power and noise robustness. We also apply it to a real-world cohort of 222 patients, on which Shapley regression accurately distinguished APDS cases from matched controls, confirming and validating phenotypes known to be associated with APDS, and facilitating the exploration of pairwise interactions between symptoms, validated by clinical experts.