Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate and generalizable estimation of vital signs—particularly blood pressure—from photoplethysmography (PPG) signals remains challenging due to the inherent non-stationarity and inter-subject variability of physiological time series. Method: This work investigates the potential of Vision Foundation Models (VFMs) for PPG analysis by transforming 1D PPG signals into 2D representations via time-frequency (STFT), phase-space, and recurrence plots, then feeding them into state-of-the-art VFMs (e.g., DINOv3, SIGLIP-2) fine-tuned via Parameter-Efficient Fine-Tuning (PEFT). Contribution/Results: To our knowledge, this is the first systematic evaluation demonstrating that VFMs outperform leading temporal models across six physiological estimation tasks—including systolic/diastolic blood pressure—achieving new state-of-the-art (SOTA) accuracy. The approach exhibits both computational efficiency and strong cross-representation generalization. By establishing a “signal → image → VFM” paradigm, this work significantly extends the applicability of vision models to physiological signal processing.

Technology Category

Application Category

📝 Abstract
Photoplethysmography (PPG) sensor in wearable and clinical devices provides valuable physiological insights in a non-invasive and real-time fashion. Specialized Foundation Models (FM) or repurposed time-series FMs are used to benchmark physiological tasks. Our experiments with fine-tuning FMs reveal that Vision FM (VFM) can also be utilized for this purpose and, in fact, surprisingly leads to state-of-the-art (SOTA) performance on many tasks, notably blood pressure estimation. We leverage VFMs by simply transforming one-dimensional PPG signals into image-like two-dimensional representations, such as the Short-Time Fourier transform (STFT). Using the latest VFMs, such as DINOv3 and SIGLIP-2, we achieve promising performance on other vital signs and blood lab measurement tasks as well. Our proposal, Vision4PPG, unlocks a new class of FMs to achieve SOTA performance with notable generalization to other 2D input representations, including STFT phase and recurrence plots. Our work improves upon prior investigations of vision models for PPG by conducting a comprehensive study, comparing them to state-of-the-art time-series FMs, and demonstrating the general PPG processing ability by reporting results on six additional tasks. Thus, we provide clinician-scientists with a new set of powerful tools that is also computationally efficient, thanks to Parameter-Efficient Fine-Tuning (PEFT) techniques.
Problem

Research questions and friction points this paper is trying to address.

Using vision foundation models to analyze PPG signals for vital signs monitoring
Transforming 1D PPG signals into 2D representations for improved blood pressure estimation
Achieving state-of-the-art performance on multiple physiological measurement tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Foundation Models analyze PPG signals for vital signs
Convert 1D PPG signals to 2D image representations like STFT
Use Parameter-Efficient Fine-Tuning for computational efficiency
🔎 Similar Papers
No similar papers found.
S
Saurabh Kataria
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, USA, 30322
Ayca Ermis
Ayca Ermis
Georgia Institute of Technology
Systems ControlPredictive Modeling
L
Lovely Yeswanth Panchumarthi
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, USA, 30322
M
Minxiao Wang
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, USA, 30322
X
Xiao Hu
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, USA, 30322