Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on Heterogeneous Data from 1.7 Million Individuals

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional cardiac signal analysis models suffer from poor generalizability, reliance on homogeneous data, and static architectures. To address these limitations, we propose the first large-scale multimodal foundation model for cardiac health awareness. Built upon a Transformer architecture, it employs generative masked pretraining to jointly model heterogeneous ECG/PPG signals and clinical text reports from 1.7 million individuals. This enables unified representation learning across devices, leads, and tasks. Compared to unimodal, single-task approaches, our model achieves significant performance gains across diverse downstream tasks—including diagnostic classification, vital sign estimation, and prognostic prediction—demonstrating superior generalizability and robustness. It establishes a transferable foundation model paradigm for intelligent cardiac health monitoring in real-world, clinically heterogeneous environments.

Technology Category

Application Category

📝 Abstract
Cardiac biosignals, such as electrocardiograms (ECG) and photoplethysmograms (PPG), are of paramount importance for the diagnosis, prevention, and management of cardiovascular diseases, and have been extensively used in a variety of clinical tasks. Conventional deep learning approaches for analyzing these signals typically rely on homogeneous datasets and static bespoke models, limiting their robustness and generalizability across diverse clinical settings and acquisition protocols. In this study, we present a cardiac sensing foundation model (CSFM) that leverages advanced transformer architectures and a generative, masked pretraining strategy to learn unified representations from vast, heterogeneous health records. Our model is pretrained on an innovative multi-modal integration of data from multiple large-scale datasets (including MIMIC-III-WDB, MIMIC-IV-ECG, and CODE), comprising cardiac signals and the corresponding clinical or machine-generated text reports from approximately 1.7 million individuals. We demonstrate that the embeddings derived from our CSFM not only serve as effective feature extractors across diverse cardiac sensing scenarios, but also enable seamless transfer learning across varying input configurations and sensor modalities. Extensive evaluations across diagnostic tasks, demographic information recognition, vital sign measurement, clinical outcome prediction, and ECG question answering reveal that CSFM consistently outperforms traditional one-modal-one-task approaches. Notably, CSFM exhibits robust performance across multiple ECG lead configurations from standard 12-lead systems to single-lead setups, and in scenarios where only ECG, only PPG, or a combination thereof is available. These findings highlight the potential of CSFM as a versatile and scalable solution, for comprehensive cardiac monitoring.
Problem

Research questions and friction points this paper is trying to address.

Develops a cardiac sensing foundation model for diverse clinical settings
Integrates multi-modal data from 1.7 million individuals for robust analysis
Enables transfer learning across ECG and PPG sensor modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal foundation model for cardiac sensing
Transformer architecture with masked pretraining strategy
Heterogeneous data integration from 1.7M individuals
🔎 Similar Papers
No similar papers found.
Xiao Gu
Xiao Gu
University of Oxford
AI for HealthcareBiomedical Signal ProcessingWearable/Ambient IntelligenceDeep Learning
W
Wei Tang
Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, UK; Department of Mathematics, City University of Hong Kong, Hong Kong; Hong Kong Center for Cerebro-Cardiovascular Health Engineering, Hong Kong
J
Jinpei Han
Brain and Behaviour Lab, Imperial College London, London SW7 2AZ, UK
Veer Sangha
Veer Sangha
Yale University
Fenglin Liu
Fenglin Liu
University of Oxford
Clinical AIAI for HealthLarge Language ModelsMultimodal AI
Shreyank N Gowda
Shreyank N Gowda
Assistant Professor at the University of Nottingham
Computer VisionZero-shot LearningGreen AI
A
Antonio H. Ribeiro
Department of Information Technology, Uppsala University, Uppsala, Sweden
Patrick Schwab
Patrick Schwab
GSK
Causal Machine LearningAI in Drug DiscoveryAI in HealthcareAI in Medicine
K
Kim Branson
GlaxoSmithKline, London, UK
Lei Clifton
Lei Clifton
Nuffield Department of Primary Care Health Sciences, University of Oxford
AI & Machine learningMedical statistics
Antonio Luiz P. Ribeiro
Antonio Luiz P. Ribeiro
Universidade Federal de Minas Gerais, Brasil
CardiologyChagas diseaseElectrocardiographyCardiovascular epidemiologyTelemedicine
Z
Zhangdaihong Liu
Department of Engineering Science, University of Oxford, Oxford OX3 7DQ, UK; Oxford Suzhou Centre for Advanced Research, University of Oxford, Suzhou 215123, China
David A. Clifton
David A. Clifton
Chair of Clinical Machine Learning, University of Oxford
Machine LearningClinical AIBiomedical Signal Processing