High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ECG generation methods suffer from low morphological fidelity and insufficient patient-specific modeling, hindering low-data machine learning in privacy-sensitive clinical settings. To address these limitations, we propose SSSD-ECG—a Mel-spectrogram-guided conditional diffusion framework that integrates structured state space models with time-frequency domain supervision, and incorporates demographic features as multimodal conditional inputs to enable high-fidelity, individualized ECG synthesis. Evaluated on the PTB-XL dataset, SSSD-ECG reduces inter-lead correlation error by 74% on average, significantly improves waveform realism, enhances privacy protection metrics by 4–8%, and achieves downstream classification performance on few-shot tasks comparable to models trained on full real-data sets. This work establishes a trustworthy synthetic data paradigm for cardiac health AI, advancing both data efficiency and privacy preservation in electrocardiographic analysis.

Technology Category

Application Category

📝 Abstract
The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative ECG methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific physiological signals. To address these gaps, we build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a novel training paradigm with time-frequency domain supervision to enforce physiological structural realism, and (2) multi-modal demographic conditioning to enable patient-specific synthesis. We comprehensively evaluate our approach on the PTB-XL dataset, assessing the synthesized ECG signals on fidelity, clinical coherence, privacy preservation, and downstream task utility. MIDT-ECG achieves substantial gains: it improves morphological coherence, preserves strong privacy guarantees with all metrics evaluated exceeding the baseline by 4-8%, and notably reduces the interlead correlation error by an average of 74%, while demographic conditioning enhances signal-to-noise ratio and personalization. In critical low-data regimes, a classifier trained on datasets supplemented with our synthetic ECGs achieves performance comparable to a classifier trained solely on real data. Together, we demonstrate that ECG synthesizers, trained with the proposed time-frequency structural regularization scheme, can serve as personalized, high-fidelity, privacy-preserving surrogates when real data are scarce, advancing the responsible use of generative AI in healthcare.
Problem

Research questions and friction points this paper is trying to address.

Generating trustworthy synthetic ECG data for medical research
Improving morphological fidelity in AI-generated electrocardiogram signals
Enabling patient-specific ECG synthesis with demographic conditioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mel-Spectrogram Informed Diffusion Training for ECG generation
Multi-modal demographic conditioning enables patient-specific synthesis
Time-frequency domain supervision enforces physiological structural realism
Z
Zhuoyi Huang
Microsoft
N
Nutan Sahoo
Microsoft
A
Anamika Kumari
Microsoft
G
Girish Kumar
Microsoft
K
Kexuan Cai
Microsoft
S
Shixing Cao
Microsoft
Y
Yue Kang
Microsoft
T
Tian Xia
Microsoft
S
Somya Chatterjee
Microsoft
N
Nicholas Hausman
Microsoft
A
Aidan Jay
Microsoft
E
Eric S. Rosenthal
Massachusetts General Hospital, Harvard University
S
Soundar Srinivasan
Microsoft
S
Sadid Hasan
Microsoft
Alex Fedorov
Alex Fedorov
Emory University
Representation LearningMultimodal LearningSelf-SupervisionNeuroimaging
Sulaiman Vesal
Sulaiman Vesal
Microsoft
Deep LearningMachine LearningLLM/SLMVLM