🤖 AI Summary
Biomedical knowledge graphs lack formal causal semantics, and structural causal models (SCMs) struggle to incorporate domain-specific prior knowledge. Method: We propose the Causal Knowledge Graph (CKG) framework—the first to formally embed causal semantics into knowledge graphs, enabling deductive causal reasoning. CKG integrates drug–disease domain knowledge to support context-driven causal hypothesis generation and deconfounding adjustment. It unifies SCMs with graph-based confounder control and mediation analysis for large-scale causal inference on UK Biobank and MIMIC-IV. Results: Experiments replicate known adverse drug reactions with high accuracy (AUC > 0.92) and identify multiple novel candidate causal effects. Side-effect similarity validation demonstrates substantial improvement in indication co-occurrence prediction (ΔAUPRC = +0.18), confirming enhanced clinical interpretability and generalizability.
📝 Abstract
Knowledge graphs and structural causal models have each proven valuable for organizing biomedical knowledge and estimating causal effects, but remain largely disconnected: knowledge graphs encode qualitative relationships focusing on facts and deductive reasoning without formal probabilistic semantics, while causal models lack integration with background knowledge in knowledge graphs and have no access to the deductive reasoning capabilities that knowledge graphs provide. To bridge this gap, we introduce a novel formulation of Causal Knowledge Graphs (CKGs) which extend knowledge graphs with formal causal semantics, preserving their deductive capabilities while enabling principled causal inference. CKGs support deconfounding via explicitly marked causal edges and facilitate hypothesis formulation aligned with both encoded and entailed background knowledge. We constructed a Drug-Disease CKG (DD-CKG) integrating disease progression pathways, drug indications, side-effects, and hierarchical disease classification to enable automated large-scale mediation analysis. Applied to UK Biobank and MIMIC-IV cohorts, we tested whether drugs mediate effects between indications and downstream disease progression, adjusting for confounders inferred from the DD-CKG. Our approach successfully reproduced known adverse drug reactions with high precision while identifying previously undocumented significant candidate adverse effects. Further validation through side effect similarity analysis demonstrated that combining our predicted drug effects with established databases significantly improves the prediction of shared drug indications, supporting the clinical relevance of our novel findings. These results demonstrate that our methodology provides a generalizable, knowledge-driven framework for scalable causal inference.