Rethinking Cross-Generator Image Forgery Detection through DINOv3

📅 2025-11-27

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Cross-generator generalization in image forgery detection remains poor due to the diversity of generative models. Method: We observe that the frozen vision foundation model DINOv3 intrinsically relies on low-frequency global structural cues—not high-frequency generator-specific artifacts—for authenticity assessment. Leveraging this property, we propose a zero-shot token selection strategy that jointly analyzes frequency, spatial, and token-level characteristics to automatically identify the most discriminative global low-frequency feature tokens; these are fed into a lightweight linear probe for detection—requiring no fine-tuning or model adaptation. Contribution/Results: Our approach achieves significant improvements in detection accuracy and cross-generator generalization across multiple benchmarks, without any training. It is the first work to uncover and exploit the interpretable, low-frequency preference of vision foundation models for forgery detection, establishing a new paradigm that is efficient, generalizable, and entirely training-free.

Technology Category

Application Category

📝 Abstract

As generative models become increasingly diverse and powerful, cross-generator detection has emerged as a new challenge. Existing detection methods often memorize artifacts of specific generative models rather than learning transferable cues, leading to substantial failures on unseen generators. Surprisingly, this work finds that frozen visual foundation models, especially DINOv3, already exhibit strong cross-generator detection capability without any fine-tuning. Through systematic studies on frequency, spatial, and token perspectives, we observe that DINOv3 tends to rely on global, low-frequency structures as weak but transferable authenticity cues instead of high-frequency, generator-specific artifacts. Motivated by this insight, we introduce a simple, training-free token-ranking strategy followed by a lightweight linear probe to select a small subset of authenticity-relevant tokens. This token subset consistently improves detection accuracy across all evaluated datasets. Our study provides empirical evidence and a feasible hypothesis for understanding why foundation models generalize across diverse generators, offering a universal, efficient, and interpretable baseline for image forgery detection.

Problem

Research questions and friction points this paper is trying to address.

Detect image forgeries across diverse generative models

Overcome failure on unseen generators due to artifact memorization

Leverage foundation models for transferable authenticity cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes frozen DINOv3 foundation model without fine-tuning

Implements token-ranking strategy to select authenticity-relevant tokens

Applies lightweight linear probe for cross-generator detection accuracy

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models