InstructTable: Improving Table Structure Recognition Through Instructions

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited accuracy in structural recognition of complex tables—such as those with merged or empty cells—stemming from insufficient semantic guidance and visual modeling. To overcome this, we propose InstructTable, a novel instruction-guided multi-stage training framework that introduces the first instruction-based pretraining mechanism tailored for table structure recognition. We further develop Table Mix Expand (TME), a template-free table synthesis approach, to construct BCDSTab, a new benchmark dataset. By integrating vision-language models, instruction fine-tuning, and synthetic data augmentation, our method achieves state-of-the-art performance across multiple datasets, including FinTabNet, PubTabNet, MUSTARD, and BCDSTab, demonstrating the effectiveness of instruction guidance and data synthesis in enhancing complex table parsing capabilities.
📝 Abstract
Table structure recognition (TSR) holds widespread practical importance by parsing tabular images into structured representations, yet encounters significant challenges when processing complex layouts involving merged or empty cells. Traditional visual-centric models rely exclusively on visual information while lacking crucial semantic support, thereby impeding accurate structural recognition in complex scenarios. Vision-language models leverage contextual semantics to enhance comprehension; however, these approaches underemphasize the modeling of visual structural information. To address these limitations, this paper introduces InstructTable, an instruction-guided multi-stage training TSR framework. Meticulously designed table instruction pre-training directs attention toward fine-grained structural patterns, enhancing comprehension of complex tables. Complementary TSR fine-tuning preserves robust visual information modeling, maintaining high-precision table parsing across diverse scenarios. Furthermore, we introduce Table Mix Expand (TME), an innovative template-free method for synthesizing large-scale authentic tabular data. Leveraging TME, we construct the Balanced Complex Dense Synthetic Tables (BCDSTab) benchmark, comprising 900 complex table images synthesized through our method to serve as a rigorous benchmark. Extensive experiments on multiple public datasets (FinTabNet, PubTabNet, MUSTARD) and BCDSTab demonstrate that InstructTable achieves state-of-the-art performance in TSR tasks. Ablation studies further confirm the positive impact of the proposed tabular-data-specific instructions and synthetic data.
Problem

Research questions and friction points this paper is trying to address.

Table Structure Recognition
Complex Table Layouts
Merged Cells
Visual-Structural Modeling
Semantic Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

instruction-guided learning
table structure recognition
vision-language model
synthetic data generation
multi-stage training
🔎 Similar Papers
No similar papers found.
B
Boming Chen
Meituan
Zining Wang
Zining Wang
Beihang University
Zhentao Guo
Zhentao Guo
Beijing Institute of Technology
Point cloud registration
J
Jianqiang Liu
Meituan
C
Chen Duan
Meituan
Y
Yu Gu
Meituan
K
Kai Zhou
Meituan
P
Pengfei Yan
Meituan