TableSeq: Unified Generation of Structure, Content, and Layout

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work proposes the first end-to-end, single-stream autoregressive framework for table understanding that operates solely on image input, unifying structural recognition, content extraction, and cell localization within a single sequence generation process. Unlike existing approaches that decouple these tasks or rely on multi-stage pipelines and external OCR systems, the model alternately outputs HTML tags, textual content, and discrete coordinate tokens through a unified decoder. It employs a lightweight high-resolution FCN-H16 encoder, a structure prior head, and a single-layer Transformer, achieving state-of-the-art or competitive performance on benchmarks including PubTabNet (TEDS 95.23), FinTabNet, and SciTSR. The architecture significantly reduces complexity compared to multi-head or pipeline-based methods while enabling downstream applications such as index-based querying.

Technology Category

Application Category

📝 Abstract

We present TableSeq, an image-only, end-to-end framework for joint table structure recognition, content recognition, and cell localization. The model formulates these tasks as a single sequence-generation problem: one decoder produces an interleaved stream of \texttt{HTML} tags, cell text, and discretized coordinate tokens, thereby aligning logical structure, textual content, and cell geometry within a unified autoregressive sequence. This design avoids external OCR, auxiliary decoders, and complex multi-stage post-processing. TableSeq combines a lightweight high-resolution FCN-H16 encoder with a minimal structure-prior head and a single-layer transformer encoder, yielding a compact architecture that remains effective on challenging layouts. Across standard benchmarks, TableSeq achieves competitive or state-of-the-art results while preserving architectural simplicity. It reaches 95.23 TEDS / 96.83 S-TEDS on PubTabNet, 97.45 TEDS / 98.69 S-TEDS on FinTabNet, and 99.79 / 99.54 / 99.66 precision / recall / F1 on SciTSR under the CAR protocol, while remaining competitive on PubTables-1M under GriTS. Beyond TSR/TCR, the same sequence interface generalizes to index-based table querying without task-specific heads, achieving the best IRDR score and competitive ICDR/ICR performance. We also study multi-token prediction for faster blockwise decoding and show that it reduces inference latency with only limited accuracy degradation. Overall, TableSeq provides a practical and reproducible single-stream baseline for unified table recognition, and the source code will be made publicly available at https://github.com/hamdilaziz/TableSeq.

Problem

Research questions and friction points this paper is trying to address.

table structure recognition

content recognition

cell localization

unified generation

table understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

table recognition

sequence generation

unified framework