🤖 AI Summary
This work addresses the challenge that unsupervised remote photoplethysmography (rPPG) models trained on in-the-wild videos are highly susceptible to interference from low-quality data, a problem exacerbated by the absence of rPPG-specific video quality assessment mechanisms. To this end, we propose rPPG-VQA, a novel framework that jointly models physiological signal quality and scene-level disturbances through a dual-branch architecture operating at both signal and scene levels. The framework enhances robustness by integrating consensus-based signal-to-noise ratio (SNR) estimation from multiple rPPG methods with insights from a multimodal large language model (MLLM). Coupled with a two-stage adaptive sampling strategy, rPPG-VQA effectively identifies high-quality training samples, significantly improving heart rate estimation accuracy of unsupervised rPPG models on large-scale in-the-wild video datasets.
📝 Abstract
Unsupervised remote photoplethysmography (rPPG) promises to leverage unlabeled video data, but its potential is hindered by a critical challenge: training on low-quality "in-the-wild" videos severely degrades model performance. An essential step missing here is to assess the suitability of the videos for rPPG model learning before using them for the task. Existing video quality assessment (VQA) methods are mainly designed for human perception and not directly applicable to the above purpose. In this work, we propose rPPG-VQA, a novel framework for assessing video suitability for rPPG. We integrate signal-level and scene-level analyses and design a dual-branch assessment architecture. The signal-level branch evaluates the physiological signal quality of the videos via robust signal-to-noise ratio (SNR) estimation with a multi-method consensus mechanism, and the scene-level branch uses a multimodal large language model (MLLM) to identify interferences like motion and unstable lighting. Furthermore, we propose a two-stage adaptive sampling (TAS) strategy that utilizes the quality score to curate optimal training datasets. Experiments show that by training on large-scale, "in-the-wild" videos filtered by our framework, we can develop unsupervised rPPG models that achieve a substantial improvement in accuracy on standard benchmarks. Our code is available at https://github.com/Tianyang-Dai/rPPG-VQA.