Revisiting Tampered Scene Text Detection in the Era of Generative AI

📅 2024-07-31

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Detecting forged scene text—text manipulated within images—poses a significant challenge in the generative AI era, particularly due to poor generalization to unseen tampering types. Method: This work introduces open-set forged text detection, a novel task enabling robust zero-shot generalization. We construct a high-quality benchmark dataset covering eight mainstream text editing models; propose a texture perturbation training paradigm to enhance fine-grained texture perception; and design the Decoupled Adversarial Framework (DAF), which learns tampering-invariant representations via feature disentanglement and joint real/forged discrimination. Contribution/Results: Our method significantly outperforms fully supervised state-of-the-art methods under zero-shot evaluation and demonstrates strong robustness to unknown forgery types in open-set settings. The code and dataset are publicly released to foster community advancement.

Technology Category

Application Category

📝 Abstract

The rapid advancements of generative AI have fueled the potential of generative text image editing, meanwhile escalating the threat of misinformation spreading. However, existing forensics methods struggle to detect unseen forgery types that they have not been trained on, underscoring the need for a model capable of generalized detection of tampered scene text. To tackle this, we propose a novel task: open-set tampered scene text detection, which evaluates forensics models on their ability to identify both seen and previously unseen forgery types. We have curated a comprehensive, high-quality dataset, featuring the texts tampered by eight text editing models, to thoroughly assess the open-set generalization capabilities. Further, we introduce a novel and effective training paradigm that subtly alters the texture of selected texts within an image and trains the model to identify these regions. This approach not only mitigates the scarcity of high-quality training data but also enhances models' fine-grained perception and open-set generalization abilities. Additionally, we present DAF, a novel framework that improves open-set generalization by distinguishing between the features of authentic and tampered text, rather than focusing solely on the tampered text's features. Our extensive experiments validate the remarkable efficacy of our methods. For example, our zero-shot performance can even beat the previous state-of-the-art full-shot model by a large margin. Our dataset and code are available at https://github.com/qcf-568/OSTF.

Problem

Research questions and friction points this paper is trying to address.

Image Text Manipulation

Fake News Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-Ended Text Modification Detection

Adaptive Training Methodology

DAF System for Enhanced Model Adaptability

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

2024-10-03arXiv.orgCitations: 14