Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

Existing remote physiological sensing methods exhibit poor generalization under variations in environment, hardware, subject pose, and physiological state. To address this, we propose MMRPhys—a novel end-to-end framework comprising a Target Signal-constrained Factorization Module (TSFM) and a lightweight dual-branch 3D-CNN architecture. For the first time, MMRPhys embeds physiological priors into a multi-dimensional attention mechanism and supports joint RGB and thermal video input to simultaneously estimate remote photoplethysmography (rPPG) and remote respiratory sinus arrhythmia (rRSP) signals. The framework achieves high generalization, ultra-low latency (<30 ms on 1080p video), and multimodal, multitask capability. Evaluated on five cross-domain benchmark datasets, MMRPhys significantly outperforms state-of-the-art methods, reducing rPPG and rRSP estimation errors by 18.7% and 22.3%, respectively. The framework and a real-time web application are publicly released.

Technology Category

Application Category

📝 Abstract

Remote physiological sensing using camera-based technologies offers transformative potential for non-invasive vital sign monitoring across healthcare and human-computer interaction domains. Although deep learning approaches have advanced the extraction of physiological signals from video data, existing methods have not been sufficiently assessed for their robustness to domain shifts. These shifts in remote physiological sensing include variations in ambient conditions, camera specifications, head movements, facial poses, and physiological states which often impact real-world performance significantly. Cross-dataset evaluation provides an objective measure to assess generalization capabilities across these domain shifts. We introduce Target Signal Constrained Factorization module (TSFM), a novel multidimensional attention mechanism that explicitly incorporates physiological signal characteristics as factorization constraints, allowing more precise feature extraction. Building on this innovation, we present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous multitask estimation of photoplethysmography (rPPG) and respiratory (rRSP) signals from multimodal RGB and thermal video inputs. Through comprehensive cross-dataset evaluation on five benchmark datasets, we demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rPPG and rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications. Our approach establishes new benchmarks for robust multitask and multimodal physiological sensing and offers a computationally efficient framework for practical deployment in unconstrained environments. The web browser-based application featuring on-device real-time inference of MMRPhys model is available at https://physiologicailab.github.io/mmrphys-live

Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of remote physiological sensing to domain shifts

Improving generalization in rPPG and rRSP signal estimation

Enabling efficient real-time multitask physiological monitoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

Target Signal Constrained Factorization module for precise feature extraction

Dual-branch 3D-CNN architecture for multitask physiological signal estimation

Cross-dataset evaluation demonstrating robust generalization across domain shifts

🔎 Similar Papers

Information Fusion in Multimodal IoT Systems for physical activity level monitoring