A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

📅 2025-04-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Commercial large language models (LLMs) exhibit uncharacterized distributional distortion and feature correlation collapse when generating synthetic electronic health records (EHRs), severely limiting cross-hospital generalization in high-dimensional clinical data. Method: We systematically evaluate GPT-4, Claude, and Gemini using a novel framework integrating structured prompt engineering with multi-center EHR pattern analysis, and quantify generation fidelity via statistical tests—including the Kolmogorov–Smirnov (KS) test. Contribution/Results: We find that while LLMs preserve distributional fidelity on low-dimensional EHR subsets (KS *p* > 0.05), they significantly deviate from real-data distributions in full-dimensional EHRs (KS *p* < 0.01). Crucially, cross-institutional correlation modeling degrades sharply with increasing dimensionality. Our study identifies feature dimensionality expansion as a critical bottleneck for cross-hospital generalization of synthetic EHRs—providing empirical evidence and actionable insights for developing trustworthy generative AI in healthcare.

Technology Category

Application Category

📝 Abstract
Synthetic Electronic Health Records (EHRs) offer a valuable opportunity to create privacy preserving and harmonized structured data, supporting numerous applications in healthcare. Key benefits of synthetic data include precise control over the data schema, improved fairness and representation of patient populations, and the ability to share datasets without concerns about compromising real individuals privacy. Consequently, the AI community has increasingly turned to Large Language Models (LLMs) to generate synthetic data across various domains. However, a significant challenge in healthcare is ensuring that synthetic health records reliably generalize across different hospitals, a long standing issue in the field. In this work, we evaluate the current state of commercial LLMs for generating synthetic data and investigate multiple aspects of the generation process to identify areas where these models excel and where they fall short. Our main finding from this work is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases, ultimately limiting their ability to generalize across diverse hospital settings.
Problem

Research questions and friction points this paper is trying to address.

Evaluating commercial LLMs for synthetic EHR generation
Assessing generalization of synthetic data across hospitals
Identifying limitations in preserving data distributions and correlations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs for synthetic EHR generation
Evaluating generalization across hospital settings
Assessing feature dimensionality impact
🔎 Similar Papers
No similar papers found.
Yihan Lin
Yihan Lin
Assistant Professor, Xiamen University
Brain inspired VisionDeep learningNeuromorphic engineeringComplex networks
Z
Zhirong Bella Yu
Bioinformatics IDP, University of California, Los Angeles, USA
S
Simon Lee
Department of Computational Medicine, University of California, Los Angeles, USA