🤖 AI Summary
This work addresses the limitation of existing waveform foundation models, which treat physiological signals as collections of local segments and thus fail to capture clinically meaningful long-range latent event structures. The authors propose a self-supervised learning framework that conceptualizes physiological signals as realizations of an underlying latent event process. By enforcing consistency constraints between randomly segmented views and their time-frequency projections, the model learns representations that are robust to perturbations while preserving event-level organization. Key innovations include an event-centric modeling paradigm, a segment-aware encoder, and a latent event interaction operator, with natural support for multimodal alignment. The approach significantly outperforms strong sequential baselines across arrhythmia classification, hemodynamic prediction, and waveform retrieval tasks, achieving consistent gains in performance, robustness, and label efficiency.
📝 Abstract
We propose a new class of waveform foundation models that departs from conventional sequence based representations by modeling physiological time series as realizations of latent event processes. Rather than treating signals as collections of local tokens or patches, our approach assumes that clinically meaningful structure arises from temporally extended, interacting events whose boundaries and dynamics are not directly observed. To capture this structure, we introduce a self supervised learning framework that enforces consistency across stochastic segmentations and time frequency projections of the same waveform, encouraging representations that are invariant to signal level perturbations while preserving event level organization. The resulting model combines a segmentation aware encoder with a latent interaction operator that captures dependencies among inferred events, and naturally extends to multimodal settings by aligning modalities through shared event representations. Across a range of physiological benchmarks, including arrhythmia classification, hemodynamic prediction, and waveform retrieval, the proposed method improves performance, robustness, and label efficiency relative to strong sequence based baselines. These results suggest that shifting from signal centric to event centric representations provides a more appropriate inductive bias for modeling physiological dynamics and offers a complementary path to scaling foundation models in healthcare.