🤖 AI Summary
To address the high annotation cost and poor generalization in document table detection, this paper proposes the first end-to-end controllable LaTeX-driven synthetic framework, enabling realistic two-column document image generation with adjustable layout, styling, and resolution, alongside precise pixel-level table masks. The framework integrates geometric layout randomization, high-fidelity rendering, and the TableNet segmentation model to support systematic model training and evaluation without real annotated data. On synthetic test sets, it achieves XOR errors of 4.04% (256×256) and 4.33% (1024×1024); on the real-world Marmot benchmark, it attains 9.18%, significantly outperforming prior methods. This work pioneers the deep integration of controllable synthesis with pixel-level evaluation, establishing a high-quality synthetic data paradigm and a reproducible evaluation benchmark for document understanding.
📝 Abstract
Document pages captured by smartphones or scanners often contain tables, yet manual extraction is slow and error-prone. We introduce an automated LaTeX-based pipeline that synthesizes realistic two-column pages with visually diverse table layouts and aligned ground-truth masks. The generated corpus augments the real-world Marmot benchmark and enables a systematic resolution study of TableNet. Training TableNet on our synthetic data achieves a pixel-wise XOR error of 4.04% on our synthetic test set with a 256x256 input resolution, and 4.33% with 1024x1024. The best performance on the Marmot benchmark is 9.18% (at 256x256), while cutting manual annotation effort through automation.