SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of existing AI-generated image detection methods to unseen generative models by proposing a novel incremental learning framework that integrates dual-path spectral analysis with retrieval-augmented generation (RAG). The approach employs four-band Fourier decomposition to extract frequency-domain features, combines a partially frozen ViT-L/14 encoder with Kolmogorov–Arnold Network (KAN)-based mixture-of-experts to model band-specific characteristics, and incorporates elastic weight consolidation for continual learning. Notably, it introduces, for the first time, a synergy between spectral consistency priors and RAG, leveraging a Milvus vector database for knowledge retrieval to enhance discriminative robustness. Evaluated on the UniversalFakeDetect benchmark encompassing 19 generative models, the method achieves an average accuracy of 94.6%, substantially outperforming current state-of-the-art techniques.
📝 Abstract
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual connections. During inference, this fused embedding retrieves the $k$ nearest labeled signatures from a Milvus database using cosine similarity to facilitate predictions via majority voting, while an incremental learning strategy expands the database and employs elastic weight consolidation to preserve previously learned transformations. Evaluated on the UniversalFakeDetect benchmark across 19 generative models -- including GANs, face-swapping, and diffusion methods -- SPARK-IL achieves a 94.6\% mean accuracy, with the code to be publicly released at https://github.com/HessenUPHF/SPARK-IL.
Problem

Research questions and friction points this paper is trying to address.

deepfake detection
cross-generator generalization
AI-generated images
frequency-domain signatures
incremental learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Retrieval-Augmented RAG
Incremental Learning
Kolmogorov-Arnold Networks
Cross-generator Deepfake Detection
Multi-band Fourier Decomposition
🔎 Similar Papers
No similar papers found.
H
Hessen Bougueffa Eutamene
Univ. Polytechnique Hauts-de-France, Valenciennes, France
A
Abdellah Zakaria Sellam
Institute of Applied Sciences and Intelligent Systems – Lecce, Italy; Department of Innovation Engineering, University of Salento, Italy
A
Abdelmalik Taleb-Ahmed
Univ. Polytechnique Hauts-de-France, Valenciennes, France
Abdenour Hadid
Abdenour Hadid
Professor, Sorbonne Center for Artificial Intelligence (SCAI)
Artificial IntelligenceComputer VisionLLMsHealthcareAutonomous Driving