🤖 AI Summary
This work addresses the lack of a systematic evaluation benchmark for foundational models on long-duration, multimodal physiological signals—such as electrocardiography (ECG) and photoplethysmography (PPG)—which has hindered their validation and comparison in clinical settings. We propose SignalMC-MED, the first standardized multimodal benchmark for synchronized single-lead ECG and PPG, comprising 10-minute recordings from 22,256 patient encounters and spanning 20 clinical tasks. Through comprehensive evaluation of representative time-series and biosignal foundation models, we demonstrate that domain-specific models consistently outperform general-purpose temporal models, multimodal fusion of ECG and PPG significantly enhances performance, full 10-minute signals yield better results than short segments, smaller models often match or exceed larger ones, and combining handcrafted features with learned representations further improves outcomes.
📝 Abstract
Recent biosignal foundation models (FMs) have demonstrated promising performance across diverse clinical prediction tasks, yet systematic evaluation on long-duration multimodal data remains limited. We introduce SignalMC-MED, a benchmark for evaluating biosignal FMs on synchronized single-lead electrocardiogram (ECG) and photoplethysmogram (PPG) data. Derived from the MC-MED dataset, SignalMC-MED comprises 22,256 visits with 10-minute overlapping ECG and PPG signals, and includes 20 clinically relevant tasks spanning prediction of demographics, emergency department disposition, laboratory value regression, and detection of prior ICD-10 diagnoses. Using this benchmark, we perform a systematic evaluation of representative time-series and biosignal FMs across ECG-only, PPG-only, and ECG + PPG settings. We find that domain-specific biosignal FMs consistently outperform general time-series models, and that multimodal ECG + PPG fusion yields robust improvements over unimodal inputs. Moreover, using the full 10-minute signal consistently outperforms shorter segments, and larger model variants do not reliably outperform smaller ones. Hand-crafted ECG domain features provide a strong baseline and offer complementary value when combined with learned FM representations. Together, these results establish SignalMC-MED as a standardized benchmark and provide practical guidance for evaluating and deploying biosignal FMs.