🤖 AI Summary
This study addresses the performance degradation of remote photoplethysmography (rPPG) under facial motion and occlusion, which stems from reliance on single-view video. To this end, the authors introduce MVRD, the first synchronized multi-view rPPG benchmark dataset, along with MVRD-rPPG, a unified end-to-end multi-view learning framework. The proposed framework effectively integrates physiological cues across views through adaptive temporal optical flow compensation (ATOC), a rhythm-visual dual-stream network, multi-view correlation-aware attention (MVCA), and coherent frequency adversarial (CFA) strategies. Experimental results demonstrate that, in dynamic occlusion scenarios within the MVRD dataset, the method achieves a mean absolute error (MAE) of 0.90 in heart rate estimation and a Pearson correlation coefficient of 0.99, significantly outperforming existing approaches.
📝 Abstract
Remote photoplethysmography (rPPG) is a non-contact technique that estimates physiological signals by analyzing subtle skin color changes in facial videos. Existing rPPG methods often encounter performance degradation under facial motion and occlusion scenarios due to their reliance on static and single-view facial videos. Thus, this work focuses on tackling the motion-induced occlusion problem for rPPG measurement in unconstrained multi-view facial videos. Specifically, we introduce a Multi-View rPPG Dataset (MVRD), a high-quality benchmark dataset featuring synchronized facial videos from three viewpoints under stationary, speaking, and head movement scenarios to better match real-world conditions. We also propose MVRD-rPPG, a unified multi-view rPPG learning framework that fuses complementary visual cues to maintain robust facial skin coverage, especially under motion conditions. Our method integrates an Adaptive Temporal Optical Compensation (ATOC) module for motion artifact suppression, a Rhythm-Visual Dual-Stream Network to disentangle rhythmic and appearance-related features, and a Multi-View Correlation-Aware Attention (MVCA) for adaptive view-wise signal aggregation. Furthermore, we introduce a Correlation Frequency Adversarial (CFA) learning strategy, which jointly enforces temporal accuracy, spectral consistency, and perceptual realism in the predicted signals. Extensive experiments and ablation studies on the MVRD dataset demonstrate the superiority of our approach. In the MVRD movement scenario, MVRD-rPPG achieves an MAE of 0.90 and a Pearson correlation coefficient (R) of 0.99. The source code and dataset will be made available.