PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address core challenges in remote photoplethysmography (rPPG)—including high sensitivity to illumination variations, severe motion artifacts, and weak temporal modeling capability—this paper proposes the first large language model (LLM)-collaborative optimization framework tailored for physiological signal estimation. Methodologically, it introduces a novel text prototype guidance (TPG) mechanism to enable cross-modal alignment between rPPG signals and semantic representations; designs a dual-domain stationarity (DDS) algorithm to adaptively reweight time-frequency features for enhanced robustness; and systematically incorporates three types of domain-specific priors: physiological statistics, environmental context, and task descriptions. Evaluated on four benchmark datasets, the proposed method consistently outperforms existing state-of-the-art approaches, demonstrating superior generalization and measurement accuracy—particularly under challenging conditions involving complex illumination and dynamic subject motion.

Technology Category

Application Category

📝 Abstract

Remote photoplethysmography (rPPG) enables non-contact physiological measurement but remains highly susceptible to illumination changes, motion artifacts, and limited temporal modeling. Large Language Models (LLMs) excel at capturing long-range dependencies, offering a potential solution but struggle with the continuous, noise-sensitive nature of rPPG signals due to their text-centric design. To bridge this gap, we introduce PhysLLM, a collaborative optimization framework that synergizes LLMs with domain-specific rPPG components. Specifically, the Text Prototype Guidance (TPG) strategy is proposed to establish cross-modal alignment by projecting hemodynamic features into LLM-interpretable semantic space, effectively bridging the representational gap between physiological signals and linguistic tokens. Besides, a novel Dual-Domain Stationary (DDS) Algorithm is proposed for resolving signal instability through adaptive time-frequency domain feature re-weighting. Finally, rPPG task-specific cues systematically inject physiological priors through physiological statistics, environmental contextual answering, and task description, leveraging cross-modal learning to integrate both visual and textual information, enabling dynamic adaptation to challenging scenarios like variable illumination and subject movements. Evaluation on four benchmark datasets, PhysLLM achieves state-of-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing rPPG signal noise from illumination and motion artifacts

Bridging LLMs' text-centric design with continuous rPPG signals

Enhancing cross-modal alignment for robust physiological sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative optimization framework synergizing LLMs with rPPG

Text Prototype Guidance aligns hemodynamic features with LLMs

Dual-Domain Stationary Algorithm re-weights time-frequency features

🔎 Similar Papers

No similar papers found.