Redefining Generalization in Visual Domains: A Two-Axis Framework for Fake Image Detection with FusionDetect

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address insufficient cross-generator and cross-visual-domain generalization in synthetic image detection, this paper formally defines the dual generalization challenge within visual domains for the first time. We introduce OmniGen—a novel, comprehensive benchmark covering multiple generative models, diverse visual scenes, and common image perturbations—constituting the first evaluation platform of its kind. Furthermore, we propose FusionDetect, a lightweight yet effective detector that freezes CLIP and DINOv2 to extract complementary semantic and textural features, fuses them into a compact joint embedding space, and leverages multi-source synthetic data alongside perturbation-robust training. Extensive experiments demonstrate consistent improvements: +3.87% accuracy and +6.13% precision on mainstream benchmarks, and +4.48% accuracy on OmniGen, significantly enhancing both robustness and generalizability across generators and visual domains.

Technology Category

Application Category

📝 Abstract

The rapid development of generative models has made it increasingly crucial to develop detectors that can reliably detect synthetic images. Although most of the work has now focused on cross-generator generalization, we argue that this viewpoint is too limited. Detecting synthetic images involves another equally important challenge: generalization across visual domains. To bridge this gap,we present the OmniGen Benchmark. This comprehensive evaluation dataset incorporates 12 state-of-the-art generators, providing a more realistic way of evaluating detector performance under realistic conditions. In addition, we introduce a new method, FusionDetect, aimed at addressing both vectors of generalization. FusionDetect draws on the benefits of two frozen foundation models: CLIP & Dinov2. By deriving features from both complementary models,we develop a cohesive feature space that naturally adapts to changes in both thecontent and design of the generator. Our extensive experiments demonstrate that FusionDetect delivers not only a new state-of-the-art, which is 3.87% more accurate than its closest competitor and 6.13% more precise on average on established benchmarks, but also achieves a 4.48% increase in accuracy on OmniGen,along with exceptional robustness to common image perturbations. We introduce not only a top-performing detector, but also a new benchmark and framework for furthering universal AI image detection. The code and dataset are available at http://github.com/amir-aman/FusionDetect

Problem

Research questions and friction points this paper is trying to address.

Detecting synthetic images across diverse visual domains

Addressing cross-generator and cross-domain generalization challenges

Developing robust fake image detection with complementary foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

FusionDetect uses CLIP and DINOv2 foundation models

It creates a cohesive feature space for generalization

Method addresses both generator and domain generalization

🔎 Similar Papers

Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method