🤖 AI Summary
Existing remote photoplethysmography (rPPG) methods struggle to jointly model long-range physiological periodicity and process high-redundancy videos efficiently, facing an inherent trade-off between computational complexity and modeling capacity for long-range dependencies. This paper proposes the first lightweight, purely sequential rPPG model capable of handling arbitrarily long videos: a multi-timescale Mamba architecture integrated with a frequency-domain feedforward mechanism—without resorting to CNNs, RNNs, or Transformers. The design simultaneously enforces short-term physiological trend constraints and captures long-range quasi-periodic signals. It reduces model parameters by 42% and FLOPs by 58%, achieves state-of-the-art performance across multiple benchmarks, and sustains real-time inference on mobile devices (>30 FPS), with no performance degradation regardless of video length.
📝 Abstract
Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy and understanding the periodic patterns of rPPG among long contexts. This represents a trade-off between computational complexity and the ability to capture long-range dependencies, posing a challenge for rPPG that is suitable for deployment on mobile devices. Based on the in-depth exploration of Mamba's comprehension of spatial and temporal information, this paper introduces RhythmMamba, an end-to-end Mamba-based method that employs multi-temporal Mamba to constrain both periodic patterns and short-term trends, coupled with frequency domain feed-forward to enable Mamba to robustly understand the quasi-periodic patterns of rPPG. Extensive experiments show that RhythmMamba achieves state-of-the-art performance with reduced parameters and lower computational complexity. The proposed RhythmMamba can be applied to video segments of any length without performance degradation. The codes are available at https://github.com/zizheng-guo/RhythmMamba.