PhysNeXt: Next-Generation Dual-Branch Structured Attention Fusion Network for Remote Photoplethysmography Measurement

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the longstanding challenge in remote photoplethysmography (rPPG) of simultaneously mitigating motion artifacts and illumination variations while preserving high-frequency physiological details, a trade-off that often compromises signal robustness. To this end, we propose PhysNeXt, a novel dual-branch deep learning framework that uniquely leverages both raw video frames and spatiotemporal maps (STMaps) as complementary multimodal inputs. By integrating spatiotemporal difference modeling, cross-modal interaction mechanisms, and a structured attention decoder, PhysNeXt enables joint optimization of noise suppression and fine-grained detail retention. Extensive experiments demonstrate that our approach significantly enhances the stability and fidelity of recovered rPPG signals under complex real-world conditions, thereby improving the accuracy and robustness of physiological measurements such as heart rate.

Technology Category

Application Category

📝 Abstract

Remote photoplethysmography (rPPG) enables contactless measurement of heart rate and other vital signs by analyzing subtle color variations in facial skin induced by cardiac pulsation. Current rPPG methods are mainly based on either end-to-end modeling from raw videos or intermediate spatial-temporal map (STMap) representations. The former preserves complete spatiotemporal information and can capture subtle heartbeat-related signals, but it also introduces substantial noise from motion artifacts and illumination variations. The latter stacks the temporal color changes of multiple facial regions of interest into compact two-dimensional representations, significantly reducing data volume and computational complexity, although some high-frequency details may be lost. To effectively integrate the mutual strengths, we propose PhysNeXt, a dual-input deep learning framework that jointly exploits video frames and STMap representations. By incorporating a spatio-temporal difference modeling unit, a cross-modal interaction module, and a structured attention-based decoder, PhysNeXt collaboratively enhances the robustness of pulse signal extraction. Experimental results demonstrate that PhysNeXt achieves more stable and fine-grained rPPG signal recovery under challenging conditions, validating the effectiveness of joint modeling of video and STMap representations. The codes will be released.

Problem

Research questions and friction points this paper is trying to address.

remote photoplethysmography

motion artifacts

spatiotemporal representation

signal robustness

vital sign measurement

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-branch fusion

structured attention

remote photoplethysmography