TextSleuth: Towards Explainable Tampered Text Detection

📅 2024-12-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Current text tampering detection methods suffer from insufficient interpretability, failing to clearly articulate detection rationales and thereby hindering trustworthy deployment. To address this, we propose an interpretable text tampering detection framework featuring a novel paradigm that integrates masked prompting with a two-stage region-focusing modeling strategy. We introduce ETTD—the first large-scale benchmark supporting both pixel-level localization and natural language descriptions of anomalies—and design an automatic annotation quality filtering mechanism for GPT-4o leveraging OCR confidence scores. Extensive experiments on ETTD and multiple public benchmarks demonstrate significant improvements in fine-grained detection accuracy and cross-domain generalization, while ensuring consistency between detection outputs and natural language explanations. Both source code and the ETTD dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Recently, tampered text detection has attracted increasing attention due to its essential role in information security. Although existing methods can detect the tampered text region, the interpretation of such detection remains unclear, making the prediction unreliable. To address this problem, we propose to explain the basis of tampered text detection with natural language via large multimodal models. To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD, which contains both pixel-level annotations for tampered text region and natural language annotations describing the anomaly of the tampered text. Multiple methods are employed to improve the quality of the proposed data. For example, elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o. A fused mask prompt is proposed to reduce confusion when querying GPT4o to generate anomaly descriptions. To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts before describing the anomaly, and to filter out the responses with low OCR accuracy. To further improve explainable tampered text detection, we propose a simple yet effective model called TextSleuth, which achieves improved fine-grained perception and cross-domain generalization by focusing on the suspected region, with a two-stage analysis paradigm and an auxiliary grounding prompt. Extensive experiments on both the ETTD dataset and the public dataset have verified the effectiveness of the proposed methods. In-depth analysis is also provided to inspire further research. Our dataset and code will be open-source.

Problem

Research questions and friction points this paper is trying to address.

Text Modification Detection

Interpretability

Credibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

TextSleuth

ETTD Dataset

GPT4o for Explanations

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models