🤖 AI Summary
This work addresses the challenge in table structure recognition of simultaneously achieving global structural consistency and precise separator localization. To this end, the authors propose an efficient grid-centric recognition approach that abandons autoregressive HTML decoding in favor of a lightweight recursive module for global reasoning. The method employs axial one-dimensional Transformers to model long-range row- and column-wise dependencies, directly predicting the number of rows and columns, header regions, and separator locations. Furthermore, it leverages ROI-aligned features to infer cross-row and cross-column relationships, enabling robust recognition of curved separators and resilience to pixel-level anonymization. Evaluated on four benchmarks—PubTabNet, FinTabNet, PubTables-1M, and SciTSR—the proposed method achieves competitive performance in both structural accuracy and inference speed.
📝 Abstract
Table structure recognition (TSR) requires both table-level coherence (row/column counts, headers, spanning cells) and precise separator localization. We introduce FastTab, a grid-centric TSR model that avoids autoregressive HTML decoding by combining (i) a lightweight Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features. Across four benchmarks (PubTabNet, FinTabNet, PubTables-1M, and SciTSR), FastTab achieves competitive structure recovery performance while operating at low-latency inference. We further study robustness under pixel-level anonymisation and show an extension to curved separators for camera-captured documents. The source code will be made publicly available at https://github.com/hamdilaziz/FastTab .