🤖 AI Summary
This work addresses the challenge of detecting high-fidelity AI-generated faces—particularly those synthesized by diffusion models—which often evade detection through conventional spatial or frequency-domain features. The authors introduce, for the first time, the physical inconsistency of specular reflections as a universal forensic cue. Leveraging the Phong illumination model and Retinex theory, they rapidly estimate facial albedo and decompose the specular component, then model its inconsistency with respect to both texture and direct illumination. To this end, they propose SRI-Net, a novel architecture featuring a two-stage cross-attention mechanism that effectively captures the complex interplay among reflectance, texture, and lighting. Extensive experiments demonstrate that the method achieves state-of-the-art performance across multiple deepfake datasets, including those generated by diffusion models, significantly outperforming existing detection approaches.
📝 Abstract
Detecting deepfakes has become increasingly challenging as forgery faces synthesized by AI-generated methods, particularly diffusion models, achieve unprecedented quality and resolution. Existing forgery detection approaches relying on spatial and frequency features demonstrate limited efficacy against high-quality, entirely synthesized forgeries. In this paper, we propose a novel detection method grounded in the observation that facial attributes governed by complex physical laws and multiple parameters are inherently difficult to replicate. Specifically, we focus on illumination, particularly the specular reflection component in the Phong illumination model, which poses the greatest replication challenge due to its parametric complexity and nonlinear formulation. We introduce a fast and accurate face texture estimation method based on Retinex theory to enable precise specular reflection separation. Furthermore, drawing from the mathematical formulation of specular reflection, we posit that forgery evidence manifests not only in the specular reflection itself but also in its relationship with corresponding face texture and direct light. To address this issue, we design the Specular-Reflection-Inconsistency-Network (SRI-Net), incorporating a two-stage cross-attention mechanism to capture these correlations and integrate specular reflection related features with image features for robust forgery detection. Experimental results demonstrate that our method achieves superior performance on both traditional deepfake datasets and generative deepfake datasets, particularly those containing diffusion-generated forgery faces.