🤖 AI Summary
Existing image forgery detection methods primarily focus on pixel-level anomalies while neglecting the semantic attributes of tampered content (e.g., scene categories, object classes) and human visual attention mechanisms, leading to frequent misclassifications in semantically salient regions. Method: We first systematically reveal a significant distributional bias between visual saliency and semantic saliency in mainstream benchmarks (IMD, FaceForensics++), then propose the first semantics-aware forensic framework—integrating SAM for segmentation, DeepGaze for bottom-up saliency prediction, and an eye-tracking model for top-down attention modeling—to establish a joint saliency–semantics modeling pipeline. Contribution/Results: This bias is shown to increase false positive rates by 23.7%. Based on this finding, we construct and publicly release the first tampering benchmark subset annotated with explicit saliency–semantics correlations, advancing the detection paradigm from pixel-driven to cognition-aware.
📝 Abstract
The social media-fuelled explosion of fake news and misinformation supported by tampered images has led to growth in the development of models and datasets for image manipulation detection. However, existing detection methods mostly treat media objects in isolation, without considering the impact of specific manipulations on viewer perception. Forensic datasets are usually analyzed based on the manipulation operations and corresponding pixel-based masks, but not on the semantics of the manipulation, i.e., type of scene, objects, and viewers' attention to scene content. The semantics of the manipulation play an important role in spreading misinformation through manipulated images. In an attempt to encourage further development of semantic-aware forensic approaches to understand visual misinformation, we propose a framework to analyze the trends of visual and semantic saliency in popular image manipulation datasets and their impact on detection.