Stanford Sleep Bench: Evaluating Polysomnography Pre-training Methods for Sleep Foundation Models

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sleep foundation models face dual bottlenecks—lack of a unified benchmark and systematic evaluation of self-supervised representation learning methods. Method: We introduce the first large-scale, multi-task polysomnography (PSG) benchmark comprising 17,467 studies (>163,000 hours), covering sleep staging, apnea diagnosis, age estimation, and prediction of 13 diseases and all-cause mortality. We establish the first standardized multi-task PSG evaluation framework to systematically compare diverse self-supervised paradigms—including contrastive, masked autoencoding, and predictive modeling approaches. Contribution/Results: While performance across conventional tasks (e.g., staging) remains comparable, contrastive learning achieves significant AUC improvements of 3.2–5.8% on disease/mortality prediction and accelerates pretraining convergence by 40%. This benchmark and empirical analysis advance reproducibility, clinical generalizability, and cross-task representation transfer in sleep foundation modeling.

Technology Category

Application Category

📝 Abstract
Polysomnography (PSG), the gold standard test for sleep analysis, generates vast amounts of multimodal clinical data, presenting an opportunity to leverage self-supervised representation learning (SSRL) for pre-training foundation models to enhance sleep analysis. However, progress in sleep foundation models is hindered by two key limitations: (1) the lack of a shared dataset and benchmark with diverse tasks for training and evaluation, and (2) the absence of a systematic evaluation of SSRL approaches across sleep-related tasks. To address these gaps, we introduce Stanford Sleep Bench, a large-scale PSG dataset comprising 17,467 recordings totaling over 163,000 hours from a major sleep clinic, including 13 clinical disease prediction tasks alongside canonical sleep-related tasks such as sleep staging, apnea diagnosis, and age estimation. We systematically evaluate SSRL pre-training methods on Stanford Sleep Bench, assessing downstream performance across four tasks: sleep staging, apnea diagnosis, age estimation, and disease and mortality prediction. Our results show that multiple pretraining methods achieve comparable performance for sleep staging, apnea diagnosis, and age estimation. However, for mortality and disease prediction, contrastive learning significantly outperforms other approaches while also converging faster during pretraining. To facilitate reproducibility and advance sleep research, we will release Stanford Sleep Bench along with pretrained model weights, training pipelines, and evaluation code.
Problem

Research questions and friction points this paper is trying to address.

Lack of shared dataset and benchmark for sleep foundation model training and evaluation.
Absence of systematic evaluation of self-supervised learning methods on sleep tasks.
Need to enhance sleep analysis using pre-trained models on multimodal clinical data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced Stanford Sleep Bench dataset with 13 clinical tasks
Systematically evaluated self-supervised learning on sleep-related tasks
Found contrastive learning excels in mortality and disease prediction
🔎 Similar Papers
M
Magnus Ruud Kjaer
Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
Rahul Thapa
Rahul Thapa
Graduate Student, Stanford University
Machine LearningHealthcare AIData Science
G
Gauri Ganjoo
Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
H
Hyatt Moore IV
Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
P
Poul Jørgen Jennum
Department of Clinical Neurophysiology, Danish Center for Sleep Medicine, Copenhagen University Hospital – Rigshospitalet, Copenhagen, Denmark.
B
Brandon M. Westover
Department of Neurology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
James Zou
James Zou
Stanford University
Machine learningcomputational biologycomputational healthstatisticsbiotech
Emmanuel Mignot
Emmanuel Mignot
Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
Bryan He
Bryan He
Stanford University
Machine learningOptimization
A
Andreas Brink-Kjaer
Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark.