UdonCare: Hierarchy Pruning for Unseen Domain Discovery in Predictive Healthcare

📅 2025-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical prediction models suffer from degraded generalization due to patient population distribution shifts across healthcare sites, while existing domain generalization (DG) methods are hindered by the absence of explicit domain labels and lack of clinical knowledge integration. To address this, we propose a medical knowledge–driven unsupervised domain discovery framework. Our approach introduces, for the first time, an iterative pruning mechanism grounded in the ICD-9-CM hierarchical ontology to enable flexible, clinically semantically aligned domain partitioning. It jointly employs an ontology-guided domain encoder and a Siamese-style disentangled inference architecture to synergistically model clinical priors and data-driven features. Extensive experiments on MIMIC-III and MIMIC-IV demonstrate that our method significantly outperforms state-of-the-art DG baselines, achieving up to a 3.2% AUC improvement under large domain shifts. This work provides the first empirical validation that structured medical ontologies enhance both robustness and interpretability in healthcare DG.

Technology Category

Application Category

📝 Abstract
Domain generalization has become a critical challenge in clinical prediction, where patient cohorts often exhibit shifting data distributions that degrade model performance. Typical domain generalization approaches struggle in real-world healthcare settings for two main reasons: (1) patient-specific domain labels are typically unavailable, making domain discovery especially difficult; (2) purely data-driven approaches overlook key clinical insights, leading to a gap in medical knowledge integration. To address these problems, we leverage hierarchical medical ontologies like the ICD-9-CM hierarchy to group diseases into higher-level categories and discover more flexible latent domains. In this paper, we introduce UdonCare, a hierarchy-guided framework that iteratively prunes fine-grained domains, encodes these refined domains, and applies a Siamese-type inference mechanism to separate domain-related signals from patient-level features. Experimental results on clinical datasets (MIMIC-III and MIMIC-IV) show that the proposed model achieves higher performance compared to other domain generalization baselines when substantial domain gaps presents, highlighting the untapped potential of medical knowledge for enhancing domain generalization in practical healthcare applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain generalization challenges in clinical prediction models
Overcoming lack of patient-specific domain labels in healthcare data
Integrating medical ontologies to improve domain discovery and model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages hierarchical medical ontologies for domain discovery
Prunes fine-grained domains iteratively for refinement
Uses Siamese-type inference to separate domain signals
🔎 Similar Papers
No similar papers found.
P
Pengfei Hu
Department of Computer Science, Stevens Institute of Technology, Hoboken, United States
Xiaoxue Han
Xiaoxue Han
Ph.D. Candidate, Stevens Institute of Technology
graph learningdeep learning
F
Fei Wang
Weill Cornell Medical College, Cornell University, New York, United States
Yue Ning
Yue Ning
Stevens Institute of Technology
Data AnalyticsKnowledge DiscoveryMachine Learning