From Statistical Fidelity to Clinical Consistency: Scalable Generation and Auditing of Synthetic Patient Trajectories

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited research credibility of existing synthetic electronic health records, which often suffer from inconsistencies between clinical processes and observations. The authors propose a two-stage integrated pipeline: first, a knowledge-guided generative model simulates high-fidelity patient trajectories by modeling nearly 32,000 clinical events; second, a large language model–based automated auditing module detects clinical contradictions, such as contraindicated medication prescriptions. Evaluated on 18,071 synthetic records, the method achieves high statistical fidelity (R² = 0.99), substantially reduces clinical inconsistencies, and enables downstream task performance comparable to or better than that achieved with real data—all without privacy leakage risks (F1 = 0.51). This work represents the first integration of knowledge-guided generation with LLM-driven clinical consistency auditing, significantly enhancing the clinical plausibility of synthetic medical records.

Technology Category

Application Category

📝 Abstract
Access to electronic health records (EHRs) for digital health research is often limited by privacy regulations and institutional barriers. Synthetic EHRs have been proposed as a way to enable safe and sovereign data sharing; however, existing methods may produce records that capture overall statistical properties of real data but present inconsistencies across clinical processes and observations. We developed an integrated pipeline to make synthetic patient trajectories clinically consistent through two synergistic steps: high-fidelity generation and scalable auditing. Using the MIMIC-IV database, we trained a knowledge-grounded generative model that represents nearly 32,000 distinct clinical events, including demographics, laboratory measurements, medications, procedures, and diagnoses, while enforcing structural integrity. To support clinical consistency at scale, we incorporated an automated auditing module leveraging large language models to filter out clinical inconsistencies (e.g., contraindicated medications) that escape probabilistic generation. We generated 18,071 synthetic patient records derived from a source cohort of 180,712 real patients. While synthetic clinical event probabilities demonstrated robust agreement (mean bias effectively 0.00) and high correlation (R2=0.99) with the real counterparts, review of a random sample of synthetic records (N=20) by three clinicians identified inconsistencies in 45-60% of them. Automated auditing reduced the difference between real and synthetic data (Cohen's effect size d between 0.59 and 1.60 before auditing, and between 0.18 and 0.67 after auditing). Downstream models trained on audited data matched or even exceeded real-data performance. We found no evidence of privacy risks, with membership inference performance indistinguishable from random guessing (F1-score=0.51).
Problem

Research questions and friction points this paper is trying to address.

synthetic EHR
clinical consistency
patient trajectories
data fidelity
healthcare data privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic patient trajectories
clinical consistency
knowledge-grounded generative model
LLM-based auditing
privacy-preserving EHR synthesis
🔎 Similar Papers
No similar papers found.
Guanglin Zhou
Guanglin Zhou
University of Queensland
CausalityDistribution shiftsAI in healthcare
A
Armin Catic
The University of New South Wales, Sydney, NSW, Australia
M
Motahare Shabestari
Shahid Sadoughi University of Medical Sciences and Health Services, Yazd, Iran
Matthew Young
Matthew Young
Rutgers university
Analytic number theory
C
Chaiquan Li
The University of Auckland, Auckland, New Zealand
K
Katrina Poppe
The University of Auckland, Auckland, New Zealand
S
Sebastiano Barbieri
The University of Queensland, Brisbane, QLD, Australia; The University of New South Wales, Sydney, NSW, Australia