SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

๐Ÿ“… 2026-04-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

201K/year
๐Ÿค– AI Summary
This work addresses the limited generalization of existing AI-generated image detection methods to unseen generative models by proposing a novel incremental learning framework that integrates dual-path spectral analysis with retrieval-augmented generation (RAG). The approach employs four-band Fourier decomposition to extract frequency-domain features, combines a partially frozen ViT-L/14 encoder with Kolmogorovโ€“Arnold Network (KAN)-based mixture-of-experts to model band-specific characteristics, and incorporates elastic weight consolidation for continual learning. Notably, it introduces, for the first time, a synergy between spectral consistency priors and RAG, leveraging a Milvus vector database for knowledge retrieval to enhance discriminative robustness. Evaluated on the UniversalFakeDetect benchmark encompassing 19 generative models, the method achieves an average accuracy of 94.6%, substantially outperforming current state-of-the-art techniques.

Technology Category

Application Category

๐Ÿ“ Abstract
Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual connections. During inference, this fused embedding retrieves the $k$ nearest labeled signatures from a Milvus database using cosine similarity to facilitate predictions via majority voting, while an incremental learning strategy expands the database and employs elastic weight consolidation to preserve previously learned transformations. Evaluated on the UniversalFakeDetect benchmark across 19 generative models -- including GANs, face-swapping, and diffusion methods -- SPARK-IL achieves a 94.6\% mean accuracy, with the code to be publicly released at https://github.com/HessenUPHF/SPARK-IL.
Problem

Research questions and friction points this paper is trying to address.

deepfake detection
cross-generator generalization
AI-generated images
frequency-domain signatures
incremental learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Retrieval-Augmented RAG
Incremental Learning
Kolmogorov-Arnold Networks
Cross-generator Deepfake Detection
Multi-band Fourier Decomposition
๐Ÿ”Ž Similar Papers
No similar papers found.