🤖 AI Summary
This study addresses the challenge of jointly modeling structured diagnostic codes and unstructured clinical notes in electronic health records (EHRs), with particular focus on directional, hierarchical causal relationships—specifically, how narrative observations trigger diagnoses and how diagnosis-associated risks propagate across sequential hospital admissions. We propose a multimodal causal graph framework: (1) aligning heterogeneous modalities via proposition extraction from clinical text and semantic mapping to ICD codes; (2) explicitly modeling “text → diagnosis” triggering and “diagnosis → subsequent risk” temporal propagation through a hierarchical causal discovery mechanism; and (3) introducing a conformal calibration method tailored to multi-label ICD coding for fine-grained confidence quantification and reliability guarantees. Evaluated on MIMIC-III and MIMIC-IV, our model achieves significant improvements in multi-label clinical risk prediction (+3.2% AUC) and calibration (41% reduction in expected calibration error), while maintaining strong interpretability and clinical utility.
📝 Abstract
Automated clinical risk prediction from electronic health records (EHRs) demands modeling both structured diagnostic codes and unstructured narrative notes. However, most prior approaches either handle these modalities separately or rely on simplistic fusion strategies that ignore the directional, hierarchical causal interactions by which narrative observations precipitate diagnoses and propagate risk across admissions. In this paper, we propose THCM-CAL, a Temporal-Hierarchical Causal Model with Conformal Calibration. Our framework constructs a multimodal causal graph where nodes represent clinical entities from two modalities: Textual propositions extracted from notes and ICD codes mapped to textual descriptions. Through hierarchical causal discovery, THCM-CAL infers three clinically grounded interactions: intra-slice same-modality sequencing, intra-slice cross-modality triggers, and inter-slice risk propagation. To enhance prediction reliability, we extend conformal prediction to multi-label ICD coding, calibrating per-code confidence intervals under complex co-occurrences. Experimental results on MIMIC-III and MIMIC-IV demonstrate the superiority of THCM-CAL.