TableZoomer: A Collaborative Agent Framework for Large-scale Table Question Answering

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial table question answering (TQA) faces three key challenges: structural heterogeneity, difficulty in target data localization, and bottlenecks in complex reasoning. To address these, this paper proposes a large language model (LLM)-based programmable agent framework. The method replaces raw textual tables with structured schema representations and introduces a query-aware dynamic subtable scaling mechanism. It integrates Program-of-Thoughts (PoT) with the ReAct paradigm to construct an executable, iterative reasoning pipeline. Column selection and entity linking are incorporated to jointly enhance semantic understanding and code generation, thereby improving localization accuracy and reasoning controllability. Evaluated on DataBench and TableBench, the framework achieves absolute accuracy gains of 19.34% and 25%, respectively, demonstrating its effectiveness and strong scalability across multi-scale tabular data.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) have shown promise in the table question answering (TQA) task through prompt engineering, they face challenges in industrial applications, including structural heterogeneity, difficulties in target data localization, and bottlenecks in complex reasoning. To address these limitations, this paper presents TableZoomer, a novel LLM-powered, programming-based agent framework. It introduces three key innovations: (1) replacing the original fully verbalized table with structured table schema to bridge the semantic gap and reduce computational complexity; (2) a query-aware table zooming mechanism that dynamically generates sub-table schema through column selection and entity linking, significantly improving target localization efficiency; and (3) a Program-of-Thoughts (PoT) strategy that transforms queries into executable code to mitigate numerical hallucination. Additionally, we integrate the reasoning workflow with the ReAct paradigm to enable iterative reasoning. Extensive experiments demonstrate that our framework maintains the usability advantages while substantially enhancing performance and scalability across tables of varying scales. When implemented with the Qwen3-8B-Instruct LLM, TableZoomer achieves accuracy improvements of 19.34% and 25% over conventional PoT methods on the large-scale DataBench dataset and the small-scale Fact Checking task of TableBench dataset, respectively.
Problem

Research questions and friction points this paper is trying to address.

Addresses structural heterogeneity in large-scale table question answering
Improves target data localization efficiency through dynamic zooming
Mitigates numerical hallucination via Program-of-Thoughts strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured table schema replaces verbalized tables
Query-aware zooming mechanism for dynamic sub-tables
Program-of-Thoughts strategy converts queries to code
🔎 Similar Papers
No similar papers found.
S
Sishi Xiong
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
Ziyang He
Ziyang He
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
Z
Zhongjiang He
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
Y
Yu Zhao
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
C
Changzai Pan
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
J
Jie Zhang
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
Z
Zhenhe Wu
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
S
Shuangyong Song
Institute of Artificial Intelligence (TeleAI), China Telecom Corp Ltd, Beijing, China.
Yongxiang Li
Yongxiang Li
Professor, RMIT University
Electronic Materials and Devices