Lightweight Complementary-Cue Fusion for Robust Video Face Forgery Detection

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of existing video face forgery detection methods, which often rely on dual-stream architectures or large backbone networks. The authors propose a lightweight single-stream fusion strategy that integrates wavelet denoising features (WDF), phase spectra (adapted from SPSL), and local binary patterns (LBP) via efficient 1×1 convolutions, introducing only 292 additional parameters to the Xception backbone. This approach achieves significant performance gains, improving AUC by 3.8% on FaceForensics++ and 4.4% on DFDC-Preview, while consistently outperforming state-of-the-art methods—including F3Net, SRM, and SPSL—across eight public benchmarks. The results demonstrate that carefully designed handcrafted feature fusion can yield high accuracy and strong robustness with minimal computational overhead.
📝 Abstract
Current face video forgery detectors use wide or dual-stream backbones. We show that a single, lightweight fusion of two handcrafted cues can achieve higher accuracy with a much smaller model. Based on the Xception baseline model (21.9 million parameters), we build two detectors: LFWS, which adds a 1x1 convolution to combine a low-frequency Wavelet-Denoised Feature (WDF) with a phase-spectrum channel derived from Spatial-Phase Shallow Learning (SPSL), and LFWL, which merges WDF with Local Binary Patterns (LBP) in the same way. This extra module adds only 292 parameters, keeping the total at 21.9 million, smaller than F3Net (22.5 million) and less than half the size of SRM (55.3 million). Even with this minimal overhead, the fused models increase the average area under the curve (AUC) from 74.8% to 78.6% on FaceForensics++ and from 70.5% to 74.9% on DFDC-Preview, gains of 3.8% and 4.4% over the Xception baseline. They also consistently outperform F3Net, SRM, and SPSL in eight public benchmarks, without extra data or test-time augmentation. These results show that carefully paired, handcrafted features, combined through the lightweight fusion block, can provide competitive robustness at a significantly lower cost than comparable frequency-based detectors. Our findings suggest a need to reevaluate scale-driven design choices in face video forgery detection.
Problem

Research questions and friction points this paper is trying to address.

face forgery detection
lightweight model
complementary cues
video forensics
model efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight fusion
handcrafted features
face forgery detection
complementary cues
Xception backbone
🔎 Similar Papers
No similar papers found.