DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the limitations of existing table recognition and visual question answering (VQA) methods, which predominantly rely on clean digital documents and struggle with noisy, structurally complex real-world administrative forms such as dental estimate sheets. To bridge this gap, the authors introduce DenTab, a dataset comprising 2,000 real-world dental estimate images annotated with high-quality HTML markup, along with 2,208 VQA tasks designed to jointly evaluate structural parsing and semantic reasoning through retrieval, aggregation, and logical consistency questions. The work further proposes Table Router Pipeline, a novel training-free framework that routes arithmetic questions to a deterministic executor, substantially enhancing inference reliability. Experiments demonstrate that current models remain weak on multi-step arithmetic and consistency tasks under realistic table layouts, whereas the proposed approach significantly improves accuracy on arithmetic VQA.

Technology Category

Application Category

📝 Abstract

Tables condense key transactional and administrative information into compact layouts, but practical extraction requires more than text recognition: systems must also recover structure (rows, columns, merged cells, headers) and interpret roles such as line items, subtotals, and totals under common capture artifacts. Many existing resources for table structure recognition and TableVQA are built from clean digital-born sources or rendered tables, and therefore only partially reflect noisy administrative conditions. We introduce DenTab, a dataset of 2{,}000 cropped table images from dental estimates with high-quality HTML annotations, enabling evaluation of table recognition (TR) and table visual question answering (TableVQA) on the same inputs. DenTab includes 2{,}208 questions across eleven categories spanning retrieval, aggregation, and logic/consistency checks. We benchmark 16 systems, including 14 vision--language models (VLMs) and two OCR baselines. Across models, strong structure recovery does not consistently translate into reliable performance on multi-step arithmetic and consistency questions, and these reasoning failures persist even when using ground-truth HTML table inputs. To improve arithmetic reliability without training, we propose the Table Router Pipeline, which routes arithmetic questions to deterministic execution. The pipeline combines (i) a VLM that produces a baseline answer, a structured table representation, and a constrained table program with (ii) a rule-based executor that performs exact computation over the parsed table. The source code and dataset will be made publicly available at https://github.com/hamdilaziz/DenTab.

Problem

Research questions and friction points this paper is trying to address.

table recognition

visual question answering

real-world tables

administrative documents

structure recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Table Recognition

Table Visual Question Answering

Real-World Tables