🤖 AI Summary
This work addresses the limited generalizability of existing learning-based sleep analysis systems, which are constrained by closed label spaces and struggle to adapt to novel sleep phenomena or support flexible querying. To overcome this, we propose the first sleep-language foundation model that aligns multimodal polysomnography signals with natural language, enabling linguistic representation and interactive exploration of sleep physiology. We construct a large-scale paired sleep-text dataset and introduce a unified pretraining framework integrating contrastive alignment, descriptive captioning, and signal reconstruction, supported by a multi-level text annotation pipeline. The resulting model significantly outperforms current approaches in zero-shot and few-shot learning, cross-modal retrieval, and sleep description generation, demonstrating strong language-guided analytical capabilities.
📝 Abstract
We present SleepLM, a family of sleep-language foundation models that enable human sleep alignment, interpretation, and interaction with natural language. Despite the critical role of sleep, learning-based sleep analysis systems operate in closed label spaces (e.g., predefined stages or events) and fail to describe, query, or generalize to novel sleep phenomena. SleepLM bridges natural language and multimodal polysomnography, enabling language-grounded representations of sleep physiology. To support this alignment, we introduce a multilevel sleep caption generation pipeline that enables the curation of the first large-scale sleep-text dataset, comprising over 100K hours of data from more than 10,000 individuals. Furthermore, we present a unified pretraining objective that combines contrastive alignment, caption generation, and signal reconstruction to better capture physiological fidelity and cross-modal interactions. Extensive experiments on real-world sleep understanding tasks verify that SleepLM outperforms state-of-the-art in zero-shot and few-shot learning, cross-modal retrieval, and sleep captioning. Importantly, SleepLM also exhibits intriguing capabilities including language-guided event localization, targeted insight generation, and zero-shot generalization to unseen tasks. All code and data will be open-sourced.