🤖 AI Summary
Current medical foundation models struggle to quantify intervention effects, detect conflicting evidence, or verify claims from the literature, thereby limiting clinical auditability. This work proposes a novel paradigm—causal compilation—that automatically transforms narrative evidence from medical literature into structured, executable causal estimands, enabling six classes of causal queries including do-calculus and counterfactual reasoning. Through effect standardization, conflict-aware graph construction, and real-world validation, the system compiles 754 studies into 1,445 effect kernels, achieving 98.5% standardization accuracy and 80.5% query executability on the Human Phenotype Project (n=10,000). This represents the first end-to-end pipeline translating unstructured text into auditable, verifiable causal inference.
📝 Abstract
Medical foundation models generate narrative explanations but cannot quantify intervention effects, detect evidence conflicts, or validate literature claims, limiting clinical auditability. We propose causal compilation, a paradigm that transforms medical evidence from narrative text into executable code. The paradigm standardizes heterogeneous research evidence into structured estimand objects, each explicitly specifying intervention contrast, effect scale, time horizon, and target population, supporting six executable causal queries: do-calculus, counterfactual reasoning, temporal trajectories, heterogeneous effects, mechanistic decomposition, and joint interventions. We instantiate this paradigm in DoAtlas-1, compiling 1,445 effect kernels from 754 studies through effect standardization, conflict-aware graph construction, and real-world validation (Human Phenotype Project, 10,000 participants). The system achieves 98.5% canonicalization accuracy and 80.5% query executability. This paradigm shifts medical AI from text generation to executable, auditable, and verifiable causal reasoning.