DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This study addresses the limitations of existing table recognition and visual question answering (VQA) methods, which predominantly rely on clean digital documents and struggle with noisy, structurally complex real-world administrative forms such as dental estimate sheets. To bridge this gap, the authors introduce DenTab, a dataset comprising 2,000 real-world dental estimate images annotated with high-quality HTML markup, along with 2,208 VQA tasks designed to jointly evaluate structural parsing and semantic reasoning through retrieval, aggregation, and logical consistency questions. The work further proposes Table Router Pipeline, a novel training-free framework that routes arithmetic questions to a deterministic executor, substantially enhancing inference reliability. Experiments demonstrate that current models remain weak on multi-step arithmetic and consistency tasks under realistic table layouts, whereas the proposed approach significantly improves accuracy on arithmetic VQA.

Technology Category

Application Category

📝 Abstract
Tables condense key transactional and administrative information into compact layouts, but practical extraction requires more than text recognition: systems must also recover structure (rows, columns, merged cells, headers) and interpret roles such as line items, subtotals, and totals under common capture artifacts. Many existing resources for table structure recognition and TableVQA are built from clean digital-born sources or rendered tables, and therefore only partially reflect noisy administrative conditions. We introduce DenTab, a dataset of 2{,}000 cropped table images from dental estimates with high-quality HTML annotations, enabling evaluation of table recognition (TR) and table visual question answering (TableVQA) on the same inputs. DenTab includes 2{,}208 questions across eleven categories spanning retrieval, aggregation, and logic/consistency checks. We benchmark 16 systems, including 14 vision--language models (VLMs) and two OCR baselines. Across models, strong structure recovery does not consistently translate into reliable performance on multi-step arithmetic and consistency questions, and these reasoning failures persist even when using ground-truth HTML table inputs. To improve arithmetic reliability without training, we propose the Table Router Pipeline, which routes arithmetic questions to deterministic execution. The pipeline combines (i) a VLM that produces a baseline answer, a structured table representation, and a constrained table program with (ii) a rule-based executor that performs exact computation over the parsed table. The source code and dataset will be made publicly available at https://github.com/hamdilaziz/DenTab.
Problem

Research questions and friction points this paper is trying to address.

table recognition
visual question answering
real-world tables
administrative documents
structure recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Table Recognition
Table Visual Question Answering
Real-World Tables
Arithmetic Reasoning
Structure Recovery
🔎 Similar Papers
No similar papers found.