A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Commercial large language models (LLMs) exhibit uncharacterized distributional distortion and feature correlation collapse when generating synthetic electronic health records (EHRs), severely limiting cross-hospital generalization in high-dimensional clinical data. Method: We systematically evaluate GPT-4, Claude, and Gemini using a novel framework integrating structured prompt engineering with multi-center EHR pattern analysis, and quantify generation fidelity via statistical tests—including the Kolmogorov–Smirnov (KS) test. Contribution/Results: We find that while LLMs preserve distributional fidelity on low-dimensional EHR subsets (KS *p* > 0.05), they significantly deviate from real-data distributions in full-dimensional EHRs (KS *p* < 0.01). Crucially, cross-institutional correlation modeling degrades sharply with increasing dimensionality. Our study identifies feature dimensionality expansion as a critical bottleneck for cross-hospital generalization of synthetic EHRs—providing empirical evidence and actionable insights for developing trustworthy generative AI in healthcare.

Technology Category

Application Category

📝 Abstract

Synthetic Electronic Health Records (EHRs) offer a valuable opportunity to create privacy preserving and harmonized structured data, supporting numerous applications in healthcare. Key benefits of synthetic data include precise control over the data schema, improved fairness and representation of patient populations, and the ability to share datasets without concerns about compromising real individuals privacy. Consequently, the AI community has increasingly turned to Large Language Models (LLMs) to generate synthetic data across various domains. However, a significant challenge in healthcare is ensuring that synthetic health records reliably generalize across different hospitals, a long standing issue in the field. In this work, we evaluate the current state of commercial LLMs for generating synthetic data and investigate multiple aspects of the generation process to identify areas where these models excel and where they fall short. Our main finding from this work is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases, ultimately limiting their ability to generalize across diverse hospital settings.

Problem

Research questions and friction points this paper is trying to address.

Evaluating commercial LLMs for synthetic EHR generation

Assessing generalization of synthetic data across hospitals

Identifying limitations in preserving data distributions and correlations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs for synthetic EHR generation

Evaluating generalization across hospital settings

Assessing feature dimensionality impact

🔎 Similar Papers

No similar papers found.