🤖 AI Summary
This work addresses the modality gap between electrocardiography (ECG) and photoplethysmography (PPG) by proposing a scalable, cross-device, and task-agnostic framework for cardiovascular state monitoring. Leveraging a multimodal masked autoencoder (M2AE), the method learns compact, modality-invariant universal representations—termed “biosignal fingerprints”—from 3.4 million paired ECG-PPG segments. These fingerprints enable cross-modal transferability, privacy preservation, and high performance using only a single input modality, without requiring access to raw waveforms or task-specific fine-tuning. Evaluated across seven downstream tasks, the approach achieves state-of-the-art results, including an AUROC of 0.974 for five-class cardiovascular disease classification and 0.877 for hypertension detection, with performance gains of up to 27.7% over existing methods in certain tasks.
📝 Abstract
Cardiovascular disease remains the leading cause of global mortality, yet scalable cardiac monitoring is hindered by the gap between diagnostic-rich ECG and ubiquitous wearable PPG. Bridging this gap requires representations that are compact, transferable across modalities and devices, and deployable without task-specific retraining. Here we introduce biosignal fingerprints: compact latent representations of cardiovascular state derived from a cross-modal foundation model, the Multi-modal Masked Autoencoder (M2AE), trained on over 3.4 million paired ECG and PPG signals. M2AE integrates modality-specific encoders with a shared bottleneck and dual decoders, jointly optimized using reconstruction and cross-modal contrastive objectives, yielding generalizable fingerprints that retain intra- and inter-modality features. Like a biometric fingerprint, these representations uniquely encode an individual's cardiovascular state in a modality-agnostic, privacy-preserving form reusable across clinical tasks without exposing raw waveform data or requiring model retraining. Across 7 downstream tasks, spanning cross-modal reconstruction, cardiovascular disease classification, hypertension detection, mortality prediction, and demographic inference, biosignal fingerprints achieve competitive or superior performance compared to leading domain-specialist foundation models in frozen settings, including an AUROC of 0.974 for five-class CVD classification and 0.877 for hypertension detection, with a maximum improvement of 27.7% in AUROC across 5 classification tasks. Critically, strong performance is maintained with only a single modality, enabling deployment in resource-constrained, single-sensor environments typical of real-world wearable monitoring, with direct implications for continuous cardiovascular monitoring across clinical and consumer health settings.