🤖 AI Summary
Existing ECG generation methods suffer from low morphological fidelity and insufficient patient-specific modeling, hindering low-data machine learning in privacy-sensitive clinical settings. To address these limitations, we propose SSSD-ECG—a Mel-spectrogram-guided conditional diffusion framework that integrates structured state space models with time-frequency domain supervision, and incorporates demographic features as multimodal conditional inputs to enable high-fidelity, individualized ECG synthesis. Evaluated on the PTB-XL dataset, SSSD-ECG reduces inter-lead correlation error by 74% on average, significantly improves waveform realism, enhances privacy protection metrics by 4–8%, and achieves downstream classification performance on few-shot tasks comparable to models trained on full real-data sets. This work establishes a trustworthy synthetic data paradigm for cardiac health AI, advancing both data efficiency and privacy preservation in electrocardiographic analysis.
📝 Abstract
The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative ECG methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific physiological signals. To address these gaps, we build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a novel training paradigm with time-frequency domain supervision to enforce physiological structural realism, and (2) multi-modal demographic conditioning to enable patient-specific synthesis. We comprehensively evaluate our approach on the PTB-XL dataset, assessing the synthesized ECG signals on fidelity, clinical coherence, privacy preservation, and downstream task utility. MIDT-ECG achieves substantial gains: it improves morphological coherence, preserves strong privacy guarantees with all metrics evaluated exceeding the baseline by 4-8%, and notably reduces the interlead correlation error by an average of 74%, while demographic conditioning enhances signal-to-noise ratio and personalization. In critical low-data regimes, a classifier trained on datasets supplemented with our synthetic ECGs achieves performance comparable to a classifier trained solely on real data. Together, we demonstrate that ECG synthesizers, trained with the proposed time-frequency structural regularization scheme, can serve as personalized, high-fidelity, privacy-preserving surrogates when real data are scarce, advancing the responsible use of generative AI in healthcare.