WavesFM: Hierarchical Representation Learning for Longitudinal Wearable Sensor Waveforms

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Inferring health phenotypes from high-sampling-rate, ultra-long wearable sensor waveforms presents significant challenges, including computational complexity, scarce annotations, and multimodal dependencies. To address these issues, this work proposes WavesFM, the first framework capable of jointly modeling both local morphological details and long-range temporal dynamics of raw, high-resolution waveforms. WavesFM employs a two-stage self-supervised learning architecture: a segment-level encoder captures fine-grained local semantics, followed by a temporal encoder that models multi-day sequences of embeddings. Pretrained on 6.8 million hours of data from 324,000 individuals and 5.3 million hours from 10,000 participants, WavesFM substantially outperforms existing methods across 58 diverse tasks spanning demographics, lifestyle, disease status, and medication use, effectively overcoming the longstanding trade-off between preserving waveform detail and capturing long-term temporal patterns.

📝 Abstract

Wearable sensors enable the continuous acquisition of high-resolution physiological waveforms, such as photoplethysmography and accelerometry, under free-living conditions. However, inferring health-related phenotypes from these signals presents significant challenges due to high sampling frequencies, multimodal dependencies, and extreme sequence lengths (e.g., weeks of recordings), compounded by a scarcity of ground-truth labels. To address these challenges, existing self-supervised learning (SSL) methodologies typically follow two paradigms: (1) learning rich morphological representations from short waveform segments while collapsing longitudinal dynamics through simple aggregation, or (2) modeling behavioral patterns from coarse, hand-crafted features (e.g. heart rate, step counts) spanning longer horizons but foregoing subtle, predictive signatures in raw waveforms. To bridge this gap, we propose WavesFM, a foundation model utilizing a two-stage SSL framework for longitudinal physiological data. Specifically, we decompose the learning problem into two stages: first, a segment-level encoder is pretrained to extract local embeddings from short waveforms; subsequently, a temporal encoder is trained to model the sequence of these embeddings across a multi-day horizon. This hierarchical approach overcomes the computational complexity of high-resolution, long-sequence data, allowing the overall model to capture both local signal semantics and the complex circadian and inter-day variations governing physiological dynamics. Pretrained on over 6.8M hours (N=324k individuals) of recordings for the first stage and 5.3M hours (N=10k) for the second stage, WavesFM demonstrates superior performance across 58 diverse tasks spanning demographics, lifestyle, health conditions, and medications.

Problem

Research questions and friction points this paper is trying to address.

wearable sensors

physiological waveforms

longitudinal data

health-related phenotypes

self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical representation learning

self-supervised learning

longitudinal physiological waveforms