🤖 AI Summary
Existing remote physiological sensing methods exhibit poor generalization under variations in environment, hardware, subject pose, and physiological state. To address this, we propose MMRPhys—a novel end-to-end framework comprising a Target Signal-constrained Factorization Module (TSFM) and a lightweight dual-branch 3D-CNN architecture. For the first time, MMRPhys embeds physiological priors into a multi-dimensional attention mechanism and supports joint RGB and thermal video input to simultaneously estimate remote photoplethysmography (rPPG) and remote respiratory sinus arrhythmia (rRSP) signals. The framework achieves high generalization, ultra-low latency (<30 ms on 1080p video), and multimodal, multitask capability. Evaluated on five cross-domain benchmark datasets, MMRPhys significantly outperforms state-of-the-art methods, reducing rPPG and rRSP estimation errors by 18.7% and 22.3%, respectively. The framework and a real-time web application are publicly released.
📝 Abstract
Remote physiological sensing using camera-based technologies offers transformative potential for non-invasive vital sign monitoring across healthcare and human-computer interaction domains. Although deep learning approaches have advanced the extraction of physiological signals from video data, existing methods have not been sufficiently assessed for their robustness to domain shifts. These shifts in remote physiological sensing include variations in ambient conditions, camera specifications, head movements, facial poses, and physiological states which often impact real-world performance significantly. Cross-dataset evaluation provides an objective measure to assess generalization capabilities across these domain shifts. We introduce Target Signal Constrained Factorization module (TSFM), a novel multidimensional attention mechanism that explicitly incorporates physiological signal characteristics as factorization constraints, allowing more precise feature extraction. Building on this innovation, we present MMRPhys, an efficient dual-branch 3D-CNN architecture designed for simultaneous multitask estimation of photoplethysmography (rPPG) and respiratory (rRSP) signals from multimodal RGB and thermal video inputs. Through comprehensive cross-dataset evaluation on five benchmark datasets, we demonstrate that MMRPhys with TSFM significantly outperforms state-of-the-art methods in generalization across domain shifts for rPPG and rRSP estimation, while maintaining a minimal inference latency suitable for real-time applications. Our approach establishes new benchmarks for robust multitask and multimodal physiological sensing and offers a computationally efficient framework for practical deployment in unconstrained environments. The web browser-based application featuring on-device real-time inference of MMRPhys model is available at https://physiologicailab.github.io/mmrphys-live