Toward Real-world Text Image Forgery Localization: Structured and Interpretable Data Synthesis

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Text-image forgery localization suffers from scarce real-world annotated data and significant distributional discrepancies between synthetic and authentic manipulations, leading to poor generalization. Method: We propose FSTS, a Fourier-inspired tampering synthesis framework. FSTS introduces the first structured, interpretable, hierarchical behavioral modeling approach: based on 16,750 real-world manipulation instances (including videos, PSD files, and editing logs), it identifies primitive editing operations via parameter analysis and clustering, then constructs a dual-level (individual–group) probabilistic model. Inspired by Fourier series, FSTS represents complex edits as linear combinations of learned basis configurations, enabling high-fidelity, controllable, and diverse tampering synthesis. Contribution/Results: Evaluated under four protocols, models trained on FSTS-synthesized data achieve substantial gains in generalization to real-world scenarios, effectively bridging the distribution gap between synthetic and authentic forgeries.

Technology Category

Application Category

📝 Abstract

Existing Text Image Forgery Localization (T-IFL) methods often suffer from poor generalization due to the limited scale of real-world datasets and the distribution gap caused by synthetic data that fails to capture the complexity of real-world tampering. To tackle this issue, we propose Fourier Series-based Tampering Synthesis (FSTS), a structured and interpretable framework for synthesizing tampered text images. FSTS first collects 16,750 real-world tampering instances from five representative tampering types, using a structured pipeline that records human-performed editing traces via multi-format logs (e.g., video, PSD, and editing logs). By analyzing these collected parameters and identifying recurring behavioral patterns at both individual and population levels, we formulate a hierarchical modeling framework. Specifically, each individual tampering parameter is represented as a compact combination of basis operation-parameter configurations, while the population-level distribution is constructed by aggregating these behaviors. Since this formulation draws inspiration from the Fourier series, it enables an interpretable approximation using basis functions and their learned weights. By sampling from this modeled distribution, FSTS synthesizes diverse and realistic training data that better reflect real-world forgery traces. Extensive experiments across four evaluation protocols demonstrate that models trained with FSTS data achieve significantly improved generalization on real-world datasets. Dataset is available at href{https://github.com/ZeqinYu/FSTS}{Project Page}.

Problem

Research questions and friction points this paper is trying to address.

Address poor generalization in text image forgery localization methods

Synthesize realistic training data capturing real-world tampering complexity

Bridge distribution gap between synthetic and real-world forgery data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes tampered text images using Fourier series

Models tampering parameters with hierarchical behavioral patterns

Generates realistic training data from real-world editing traces

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models