🤖 AI Summary
This work addresses the limitation of existing PPG foundation models, which overlook the multi-band spectral structure of photoplethysmographic signals during pretraining and consequently struggle to effectively capture multiscale physiological features ranging from fine-grained waveforms to global rhythms. To overcome this, we propose a Masked Multi-scale Reconstruction (MMR) framework that, for the first time, integrates wavelet-driven multi-resolution time–frequency representations into self-supervised PPG learning. Specifically, the input signal is decomposed via wavelet transform, and randomly masked wavelet coefficients are reconstructed within a Transformer encoder, thereby explicitly fusing multiscale time–frequency information. Evaluated across 19 health-related tasks, our method matches or surpasses current state-of-the-art open-source PPG and general-purpose time-series foundation models on 17 tasks, significantly enhancing the physiological interpretability, generalization, and robustness of learned representations.
📝 Abstract
Wearable foundation models have the potential to transform digital health by learning transferable representations from large-scale biosignals collected in everyday settings. While recent progress has been made in large-scale pretraining, most approaches overlook the spectral structure of photoplethysmography (PPG) signals, wherein physiological rhythms unfold across multiple frequency bands. Motivated by the insight that many downstream health-related tasks depend on multi-resolution features spanning fine-grained waveform morphology to global rhythmic dynamics, we introduce Masked Multiscale Reconstruction (MMR) for PPG representation learning - a self-supervised pretraining framework that explicitly learns from hierarchical time-frequency scales of PPG data. The pretraining task is designed to reconstruct randomly masked out coefficients obtained from a wavelet-based multiresolution decomposition of PPG signals, forcing the transformer encoder to integrate information across temporal and spectral scales. We pretrain our model with MMR using ~17 million unlabeled 10-second PPG segments from ~32,000 smartwatch users. On 17 of 19 diverse health-related tasks, MMR trained on large-scale wearable PPG data improves over or matches state-of-the-art open-source PPG foundation models, time-series foundation models, and other self-supervised baselines. Extensive analysis of our learned embeddings and systematic ablations underscores the value of wavelet-based representations, showing that they capture robust and physiologically-grounded features. Together, these results highlight the potential of MMR as a step toward generalizable PPG foundation models.