Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Accurate and generalizable estimation of vital signs—particularly blood pressure—from photoplethysmography (PPG) signals remains challenging due to the inherent non-stationarity and inter-subject variability of physiological time series. Method: This work investigates the potential of Vision Foundation Models (VFMs) for PPG analysis by transforming 1D PPG signals into 2D representations via time-frequency (STFT), phase-space, and recurrence plots, then feeding them into state-of-the-art VFMs (e.g., DINOv3, SIGLIP-2) fine-tuned via Parameter-Efficient Fine-Tuning (PEFT). Contribution/Results: To our knowledge, this is the first systematic evaluation demonstrating that VFMs outperform leading temporal models across six physiological estimation tasks—including systolic/diastolic blood pressure—achieving new state-of-the-art (SOTA) accuracy. The approach exhibits both computational efficiency and strong cross-representation generalization. By establishing a “signal → image → VFM” paradigm, this work significantly extends the applicability of vision models to physiological signal processing.

Technology Category

Application Category

📝 Abstract

Photoplethysmography (PPG) sensor in wearable and clinical devices provides valuable physiological insights in a non-invasive and real-time fashion. Specialized Foundation Models (FM) or repurposed time-series FMs are used to benchmark physiological tasks. Our experiments with fine-tuning FMs reveal that Vision FM (VFM) can also be utilized for this purpose and, in fact, surprisingly leads to state-of-the-art (SOTA) performance on many tasks, notably blood pressure estimation. We leverage VFMs by simply transforming one-dimensional PPG signals into image-like two-dimensional representations, such as the Short-Time Fourier transform (STFT). Using the latest VFMs, such as DINOv3 and SIGLIP-2, we achieve promising performance on other vital signs and blood lab measurement tasks as well. Our proposal, Vision4PPG, unlocks a new class of FMs to achieve SOTA performance with notable generalization to other 2D input representations, including STFT phase and recurrence plots. Our work improves upon prior investigations of vision models for PPG by conducting a comprehensive study, comparing them to state-of-the-art time-series FMs, and demonstrating the general PPG processing ability by reporting results on six additional tasks. Thus, we provide clinician-scientists with a new set of powerful tools that is also computationally efficient, thanks to Parameter-Efficient Fine-Tuning (PEFT) techniques.

Problem

Research questions and friction points this paper is trying to address.

Using vision foundation models to analyze PPG signals for vital signs monitoring

Transforming 1D PPG signals into 2D representations for improved blood pressure estimation

Achieving state-of-the-art performance on multiple physiological measurement tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Foundation Models analyze PPG signals for vital signs

Convert 1D PPG signals to 2D image representations like STFT

Use Parameter-Efficient Fine-Tuning for computational efficiency

🔎 Similar Papers

TransfoRhythm: A Transformer Architecture Conductive to Blood Pressure Estimation via Solo PPG Signal Capturing