HCT-QA: A Benchmark for Question Answering on Human-Centric Tables

📅 2025-03-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Human-centered tables (HCTs) exhibit complex layouts and heterogeneous formats, rendering existing data extraction and querying methods inadequate for effective question answering. To address this, we introduce HCT-QA—the first dedicated QA benchmark for HCTs—comprising over 6,800 real-world and synthetically generated tables and 77K natural-language question-answer pairs. We formally define the HCT QA task and propose a hybrid construction methodology integrating real-document parsing (from PDF/HTML), human verification, and controllable synthetic generation—moving beyond SQL-centric paradigms to enable LLM-native table understanding. We conduct zero-shot and few-shot evaluations across leading open- and closed-source LLMs, revealing an average F1 score below 40%, exposing critical limitations in layout awareness and cross-cell reasoning. HCT-QA establishes a new standard for evaluating and advancing HCT comprehension models.

Technology Category

Application Category

📝 Abstract

Tabular data embedded within PDF files, web pages, and other document formats are prevalent across numerous sectors such as government, engineering, science, and business. These human-centric tables (HCTs) possess a unique combination of high business value, intricate layouts, limited operational power at scale, and sometimes serve as the only data source for critical insights. However, their complexity poses significant challenges to traditional data extraction, processing, and querying methods. While current solutions focus on transforming these tables into relational formats for SQL queries, they fall short in handling the diverse and complex layouts of HCTs and hence being amenable to querying. This paper describes HCT-QA, an extensive benchmark of HCTs, natural language queries, and related answers on thousands of tables. Our dataset includes 2,188 real-world HCTs with 9,835 QA pairs and 4,679 synthetic tables with 67.5K QA pairs. While HCTs can be potentially processed by different type of query engines, in this paper, we focus on Large Language Models as potential engines and assess their ability in processing and querying such tables.

Problem

Research questions and friction points this paper is trying to address.

Addressing question answering challenges on human-centric tabular data

Overcoming limitations of traditional methods for complex table layouts

Evaluating Large Language Models for querying diverse real-world tables

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes benchmark for human-centric table question answering

Evaluates large language models as query engines

Includes real-world and synthetic tables with QA pairs

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering