🤖 AI Summary
rPPG-based non-contact physiological monitoring faces two key challenges in edge deployment: severe computational resource constraints and signal distortion induced by video compression and transmission. To address these, we propose a spatiotemporal state-space dual mechanism that jointly models subtle periodic physiological signals across video frames via transferable cardiac state representations and a lightweight temporal modeling architecture. We further introduce spatiotemporal feature disentanglement learning to balance long-sequence training capability with low-latency inference. Our method consumes only 3.6 MB of memory and achieves a per-frame latency of 9.46 ms—accelerating inference by 83–99% over SOTA methods—while reducing mean absolute error (MAE) by 49%. This breakthrough significantly alleviates the longstanding accuracy-efficiency-generalizability trade-off in rPPG, enabling real-time edge deployment and interactive online demonstration.
📝 Abstract
Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83% to 99%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.