🤖 AI Summary
Robust authentication using low-frame-rate fingertip video-based photoplethysmography (PPG) signals remains challenging due to motion artifacts, illumination variations, and inter-subject physiological variability. To address this, we propose a lightweight spatiotemporal fusion deep learning framework. Our approach introduces a novel hybrid model—CVT-ConvMixer-LSTM—that jointly captures time-frequency domain features and long-term temporal dependencies. Time-frequency features are extracted via principal component analysis (PCA)-based denoising, bandpass filtering, Fourier-domain resampling, amplitude normalization, and continuous wavelet transform (CWT). The integrated architecture significantly enhances noise robustness and cross-subject generalization. Evaluated on the 46-subject CFIHSR dataset, our method achieves 98% authentication accuracy. Results demonstrate its efficacy and practicality for resource-constrained mobile and embedded security applications.
📝 Abstract
Photoplethysmography (PPG) signals, which measure changes in blood volume in the skin using light, have recently gained attention in biometric authentication because of their non-invasive acquisition, inherent liveness detection, and suitability for low-cost wearable devices. However, PPG signal quality is challenged by motion artifacts, illumination changes, and inter-subject physiological variability, making robust feature extraction and classification crucial. This study proposes a lightweight and cost-effective biometric authentication framework based on PPG signals extracted from low-frame-rate fingertip videos. The CFIHSR dataset, comprising PPG recordings from 46 subjects at a sampling rate of 14 Hz, is employed for evaluation. The raw PPG signals undergo a standard preprocessing pipeline involving baseline drift removal, motion artifact suppression using Principal Component Analysis (PCA), bandpass filtering, Fourier-based resampling, and amplitude normalization. To generate robust representations, each one-dimensional PPG segment is converted into a two-dimensional time-frequency scalogram via the Continuous Wavelet Transform (CWT), effectively capturing transient cardiovascular dynamics. We developed a hybrid deep learning model, termed CVT-ConvMixer-LSTM, by combining spatial features from the Convolutional Vision Transformer (CVT) and ConvMixer branches with temporal features from a Long Short-Term Memory network (LSTM). The experimental results on 46 subjects demonstrate an authentication accuracy of 98%, validating the robustness of the model to noise and variability between subjects. Due to its efficiency, scalability, and inherent liveness detection capability, the proposed system is well-suited for real-world mobile and embedded biometric security applications.