A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Systemic lupus erythematosus (SLE) exhibits high etiological heterogeneity, and its diagnosis is hindered by incomplete, noisy electronic health records (EHRs). Method: We propose an unsupervised framework integrating independent component analysis (ICA) with causal representation learning—first modeling multimodal clinical observations in EHRs as latent exogenous factors in a causal graph to interpretable decompose SLE’s etiological heterogeneity. Contribution/Results: Applied to real-world EHRs, our method automatically discovers 19 probabilistically independent, clinically interpretable latent etiologic sources, serving as patient-level explainable representations. Lightweight supervised validation and multi-dimensional clinical evaluation confirm that these sources significantly improve SLE discrimination (AUC ↑8.2%) and support clinicians in performing etiology-guided decision trade-offs for complex cases.

Technology Category

Application Category

📝 Abstract
Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.
Problem

Research questions and friction points this paper is trying to address.

Systemic Lupus Erythematosus
Causal Inference
Medical Record Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Imperfect Medical Records
Systemic Lupus Erythematosus (SLE) Prediction
Minimal Labeled Cases
🔎 Similar Papers
No similar papers found.
Marco Barbero Mota
Marco Barbero Mota
Vanderbilt University School of Medicine Department of Biomedical Informatics
Machine learningCausal AICausal InferencePrecision MedicineRepresentation Learning
J
J. M. Still
Vanderbilt University Medical Center, Department of Biomedical Informatics
J
J. L. Gamboa
Vanderbilt University Medical Center, Department of Medicine
Eric V. Strobl
Eric V. Strobl
University of Pittsburgh
Causal DiscoveryCausal InferenceTranslational BioinformaticsComputational Psychiatry
C
Charles M. Stein
Vanderbilt University Medical Center, Department of Medicine
V
V. Kawai
Vanderbilt University Medical Center, Department of Medicine
T
T. Lasko
Vanderbilt University Medical Center, Department of Computer Science