sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unified modeling of multimodal nocturnal physiological signals—such as EEG and ECG—hampered by device heterogeneity and sensor dropout. The authors propose a cross-modal alignment pretraining framework that integrates demographic, age, recording site, and medical history metadata to learn robust shared representations. Central to this approach are a metadata-aware InfoNCE loss and a dynamic negative sample weighting mechanism. The study further uncovers, for the first time, a scaling law governing the relationship between modality diversity and model capacity. Evaluated on sleep staging and clinical outcome prediction tasks, the method significantly outperforms strong baselines and demonstrates consistent robustness across arbitrary modality subsets and missing-data scenarios.

Technology Category

Application Category

📝 Abstract
Tasks ranging from sleep staging to clinical diagnosis traditionally rely on standard polysomnography (PSG) devices, bedside monitors and wearable devices, which capture diverse nocturnal biosignals (e.g., EEG, EOG, ECG, SpO$_2$). However, heterogeneity across devices and frequent sensor dropout pose significant challenges for unified modelling of these multimodal signals. We present \texttt{sleep2vec}, a foundation model for diverse and incomplete nocturnal biosignals that learns a shared representation via cross-modal alignment. \texttt{sleep2vec} is contrastively pre-trained on 42,249 overnight recordings spanning nine modalities using a \textit{Demography, Age, Site \&History-aware InfoNCE} objective that incorporates physiological and acquisition metadata (\textit{e.g.}, age, gender, recording site) to dynamically weight negatives and mitigate cohort-specific shortcuts. On downstream sleep staging and clinical outcome assessment, \texttt{sleep2vec} consistently outperforms strong baselines and remains robust to any subset of available modalities and sensor dropout. We further characterize, to our knowledge for the first time, scaling laws for nocturnal biosignals with respect to modality diversity and model capacity. Together, these results show that unified cross-modal alignment, coupled with principled scaling, enables label-efficient, general-purpose modelling of real-world nocturnal biosignals.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous biosignals
sensor dropout
cross-modal alignment
nocturnal signals
multimodal modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal alignment
foundation model
contrastive learning
sensor dropout robustness
scaling laws
🔎 Similar Papers
No similar papers found.
W
Weixuan Yuan
Five Seasons Medical
Zengrui Jin
Zengrui Jin
Tsinghua University
Speech Recognition
Y
Yichen Wang
Five Seasons Medical
D
Donglin Xie
Peking University
Z
Ziyi Ye
Fudan University
Chao Zhang
Chao Zhang
Tsinghua University
software and system securityAI for securityblockchaindata security
X
Xuesong Chen
Five Seasons Medical