🤖 AI Summary
Accurate and generalizable estimation of vital signs—particularly blood pressure—from photoplethysmography (PPG) signals remains challenging due to the inherent non-stationarity and inter-subject variability of physiological time series.
Method: This work investigates the potential of Vision Foundation Models (VFMs) for PPG analysis by transforming 1D PPG signals into 2D representations via time-frequency (STFT), phase-space, and recurrence plots, then feeding them into state-of-the-art VFMs (e.g., DINOv3, SIGLIP-2) fine-tuned via Parameter-Efficient Fine-Tuning (PEFT).
Contribution/Results: To our knowledge, this is the first systematic evaluation demonstrating that VFMs outperform leading temporal models across six physiological estimation tasks—including systolic/diastolic blood pressure—achieving new state-of-the-art (SOTA) accuracy. The approach exhibits both computational efficiency and strong cross-representation generalization. By establishing a “signal → image → VFM” paradigm, this work significantly extends the applicability of vision models to physiological signal processing.
📝 Abstract
Photoplethysmography (PPG) sensor in wearable and clinical devices provides valuable physiological insights in a non-invasive and real-time fashion. Specialized Foundation Models (FM) or repurposed time-series FMs are used to benchmark physiological tasks. Our experiments with fine-tuning FMs reveal that Vision FM (VFM) can also be utilized for this purpose and, in fact, surprisingly leads to state-of-the-art (SOTA) performance on many tasks, notably blood pressure estimation. We leverage VFMs by simply transforming one-dimensional PPG signals into image-like two-dimensional representations, such as the Short-Time Fourier transform (STFT). Using the latest VFMs, such as DINOv3 and SIGLIP-2, we achieve promising performance on other vital signs and blood lab measurement tasks as well. Our proposal, Vision4PPG, unlocks a new class of FMs to achieve SOTA performance with notable generalization to other 2D input representations, including STFT phase and recurrence plots. Our work improves upon prior investigations of vision models for PPG by conducting a comprehensive study, comparing them to state-of-the-art time-series FMs, and demonstrating the general PPG processing ability by reporting results on six additional tasks. Thus, we provide clinician-scientists with a new set of powerful tools that is also computationally efficient, thanks to Parameter-Efficient Fine-Tuning (PEFT) techniques.