🤖 AI Summary
Existing text removal datasets (e.g., SCUT-EnsText) suffer from three critical limitations: ground-truth artifacts introduced by manual editing, overly simplistic backgrounds, and narrow evaluation metrics—severely hindering cross-domain generalization and objective model assessment. To address these issues, we propose a synthesis paradigm tailored for complex scenes, integrating object-aware layout planning with vision-language model (VLM)-driven content generation to jointly produce photorealistic text overlays and artifact-free clean ground truth. Based on this framework, we construct and publicly release OTR—a first-of-its-kind large-scale, highly diverse benchmark featuring text superimposed over complex, realistic backgrounds across multi-scale, multi-occlusion, and multi-semantic-context scenarios. Experiments demonstrate that models trained on OTR achieve PSNR/SSIM improvements of over 2.1 dB / 0.03 on real-world images, significantly enhancing generalization capability and reconstruction fidelity—establishing a robust data foundation for privacy-preserving text removal and intelligent image editing.
📝 Abstract
Text removal is a crucial task in computer vision with applications such as privacy preservation, image editing, and media reuse. While existing research has primarily focused on scene text removal in natural images, limitations in current datasets hinder out-of-domain generalization or accurate evaluation. In particular, widely used benchmarks such as SCUT-EnsText suffer from ground truth artifacts due to manual editing, overly simplistic text backgrounds, and evaluation metrics that do not capture the quality of generated results. To address these issues, we introduce an approach to synthesizing a text removal benchmark applicable to domains other than scene texts. Our dataset features text rendered on complex backgrounds using object-aware placement and vision-language model-generated content, ensuring clean ground truth and challenging text removal scenarios. The dataset is available at https://huggingface.co/datasets/cyberagent/OTR .