🤖 AI Summary
This work addresses the computational intensity and low energy efficiency of existing deepfake video detection methods, which hinder large-scale deployment. The authors propose a hybrid digital–optical architecture comprising a lightweight digital frontend for key feature extraction and an optical backend that leverages a programmable spatial light modulator to enable spatially multiplexed, parallel inference. A single optical propagation simultaneously processes over 15 video streams. This approach pioneers the integration of spatially multiplexed optical computing into deepfake detection, achieving 97.79% accuracy on Celeb-DF (with 99.86% sensitivity and 95.72% specificity). It substantially enhances throughput and energy efficiency while demonstrating robustness against compression, noise, misalignment, and black-box adversarial attacks, thereby overcoming the longstanding trade-off between efficiency and robustness inherent in purely digital systems.
📝 Abstract
The rapid proliferation of AI-generated visual media has created an urgent need for efficient, trustworthy deepfake detection systems. However, existing deep learning-based detection methods rely on computationally intensive and energy-demanding inference algorithms, limiting their scalability. Here, we present a hybrid digital-analog deepfake video detection framework that combines a lightweight digital front-end with a spatially multiplexed optical decoding back-end for massively parallel analog inference through a programmable spatial light modulator. By simultaneously processing 15 or more video streams within a single optical propagation pass, the system enables high-throughput and accurate video-level authenticity prediction at reduced computational cost compared with purely digital methods. We validated this hybrid deepfake video processor using different datasets spanning classical face-swapping, real-world deepfake recordings, and fully AI-generated videos. Using a spatially multiplexed experimental set-up operating in the visible spectrum, we achieved average deepfake detection accuracy, sensitivity and specificity of 97.79%, 99.86% and 95.72%, respectively, on the Celeb-DF video dataset with 15 videos tested in parallel in a single optical pass per inference. The multiplexed optical decoder also demonstrates resilience against various types of video degradation, noise, compression, experimental misalignments and black-box adversarial attacks. Our results show that integrating optical computation into AI inference enables simultaneous gains in throughput, energy efficiency, and adversarial robustness - three properties that are difficult to achieve together in purely digital systems.