FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the limitations of existing image forgery detection methods, which overly rely on semantic content while neglecting low-level textural cues and lacking interpretability. To overcome these issues, we propose a multimodal large language model framework that integrates features from both the RGB spatial domain and the frequency domain, leveraging a cross-attention mechanism to jointly model dual-domain representations. This approach enables high-accuracy forgery detection and localization, along with human-interpretable, cross-domain explanations. To support this research, we introduce FSE-Set, the first large-scale dataset annotated with pixel-level masks and dual-domain labels. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both detection accuracy and interpretability, validating the efficacy of the spatial-frequency dual-domain fusion strategy.

Technology Category

Application Category

📝 Abstract

Advances in image tampering techniques, particularly generative models, pose significant challenges to media verification, digital forensics, and public trust. Existing image forgery detection and localization (IFDL) methods suffer from two key limitations: over-reliance on semantic content while neglecting textural cues, and limited interpretability of subtle low-level tampering traces. To address these issues, we propose FOCA, a multimodal large language model-based framework that integrates discriminative features from both the RGB spatial and frequency domains via a cross-attention fusion module. This design enables accurate forgery detection and localization while providing explicit, human-interpretable cross-domain explanations. We further introduce FSE-Set, a large-scale dataset with diverse authentic and tampered images, pixel-level masks, and dual-domain annotations. Extensive experiments show that FOCA outperforms state-of-the-art methods in detection performance and interpretability across both spatial and frequency domains.

Problem

Research questions and friction points this paper is trying to address.

image forgery detection

tampering localization

frequency domain

interpretability

multimodal analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language model

frequency domain

cross-attention fusion