🤖 AI Summary
This study addresses the vulnerability of mobile remote authentication to presentation attacks, deepfakes, and video injection threats. The authors propose leveraging passively collected multi-sensor motion trajectories during selfie capture as auxiliary biometric cues, integrating multivariate time series classification methods—such as QUANT+3-NN and WEASEL+MUSE—with anomaly detection to simultaneously screen for attacks and verify identity. The work presents the first systematic validation of selfie-induced dynamic signals for defending against deepfake and injection attacks, establishes a realistic multimodal evaluation framework, and highlights the discrepancy between closed-set classification accuracy and verification performance. Experimental results demonstrate that a unimodal accelerometer alone achieves 0% false rejection rate, while under a 9-channel configuration, methods like WEASEL+MUSE attain error rates as low as 1.07%.
📝 Abstract
Mobile remote identity verification (RIdV) systems are exposed to attacks that manipulate or replace the facial video stream, including presentation attacks, real-time deepfakes, and video injection. Recent European requirements, including ETSI TS 119 461 and CEN/TS 18099, motivate complementary evidence channels beyond camera-based presentation-attack detection. This paper investigates whether passive motion traces recorded during selfie capture provide auxiliary evidence for spoof screening and user verification. We introduce CanSelfie, a dataset of 375 bona fide multi-sensor sequences collected at 50\,Hz from 30 participants using a commercial mobile RIdV application, together with stationary, handheld, and temporally shifted attack-proxy scenarios. We benchmark 7 multivariate time-series classifiers and 8 whole-series anomaly detectors across sensor configurations and temporal windows. For spoof screening, accelerometer-only ROCKAD obtains 0.00\% false rejection rate (FRR) and 43.8\% false acceptance rate (FAR), while QUANT+3-NN obtains the lowest overall FAR of 32.0\% at 2.37\% FRR; both reject all stationary attack proxies. For same-device and same-session user verification, WEASEL+MUSE reaches 1.07\% equal error rate (EER) using 9 sensor channels. The analysis shows that raw accelerometer data, preserving gravity and orientation cues, is the most informative modality, and that closed-set classification accuracy alone does not imply good verification performance because threshold calibration depends on score distributions. The findings suggest that short selfie-capture motion traces contain measurable spoof-related and identity-related information, supporting their use as a low-friction auxiliary signal while also identifying the need for cross-device, cross-session, and real injection-attack evaluation.