🤖 AI Summary
This study addresses the limited research credibility of existing synthetic electronic health records, which often suffer from inconsistencies between clinical processes and observations. The authors propose a two-stage integrated pipeline: first, a knowledge-guided generative model simulates high-fidelity patient trajectories by modeling nearly 32,000 clinical events; second, a large language model–based automated auditing module detects clinical contradictions, such as contraindicated medication prescriptions. Evaluated on 18,071 synthetic records, the method achieves high statistical fidelity (R² = 0.99), substantially reduces clinical inconsistencies, and enables downstream task performance comparable to or better than that achieved with real data—all without privacy leakage risks (F1 = 0.51). This work represents the first integration of knowledge-guided generation with LLM-driven clinical consistency auditing, significantly enhancing the clinical plausibility of synthetic medical records.
📝 Abstract
Access to electronic health records (EHRs) for digital health research is often limited by privacy regulations and institutional barriers. Synthetic EHRs have been proposed as a way to enable safe and sovereign data sharing; however, existing methods may produce records that capture overall statistical properties of real data but present inconsistencies across clinical processes and observations. We developed an integrated pipeline to make synthetic patient trajectories clinically consistent through two synergistic steps: high-fidelity generation and scalable auditing. Using the MIMIC-IV database, we trained a knowledge-grounded generative model that represents nearly 32,000 distinct clinical events, including demographics, laboratory measurements, medications, procedures, and diagnoses, while enforcing structural integrity. To support clinical consistency at scale, we incorporated an automated auditing module leveraging large language models to filter out clinical inconsistencies (e.g., contraindicated medications) that escape probabilistic generation. We generated 18,071 synthetic patient records derived from a source cohort of 180,712 real patients. While synthetic clinical event probabilities demonstrated robust agreement (mean bias effectively 0.00) and high correlation (R2=0.99) with the real counterparts, review of a random sample of synthetic records (N=20) by three clinicians identified inconsistencies in 45-60% of them. Automated auditing reduced the difference between real and synthetic data (Cohen's effect size d between 0.59 and 1.60 before auditing, and between 0.18 and 0.67 after auditing). Downstream models trained on audited data matched or even exceeded real-data performance. We found no evidence of privacy risks, with membership inference performance indistinguishable from random guessing (F1-score=0.51).