RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tabular evaluation benchmarks are largely confined to simple flat structures or outdated datasets, lacking systematic assessment of real-world complex hierarchical tables—particularly those with nested headers and multimodal formats. To address this gap, we propose TableBench, the first comprehensive benchmark specifically designed for complex hierarchical tables, supporting multimodal inputs (LaTeX, HTML, PNG) and diverse reasoning tasks. We further introduce TreeThinker, a novel tree-structured parsing method that explicitly models header hierarchies to enhance structural awareness. Additionally, we establish a unified evaluation framework for both LLMs and multimodal LLMs (MLLMs). Extensive evaluation across 25 state-of-the-art models demonstrates TableBench’s high difficulty and practical relevance. TreeThinker achieves substantial gains in hierarchical table question answering—improving accuracy by an average of +12.3%—thereby empirically validating the critical role of structured header modeling in tabular reasoning.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of Large Language Models (LLMs), there is an increasing need for challenging benchmarks to evaluate their capabilities in handling complex tabular data. However, existing benchmarks are either based on outdated data setups or focus solely on simple, flat table structures. In this paper, we introduce RealHiTBench, a comprehensive benchmark designed to evaluate the performance of both LLMs and Multimodal LLMs (MLLMs) across a variety of input formats for complex tabular data, including LaTeX, HTML, and PNG. RealHiTBench also includes a diverse collection of tables with intricate structures, spanning a wide range of task types. Our experimental results, using 25 state-of-the-art LLMs, demonstrate that RealHiTBench is indeed a challenging benchmark. Moreover, we also develop TreeThinker, a tree-based pipeline that organizes hierarchical headers into a tree structure for enhanced tabular reasoning, validating the importance of improving LLMs' perception of table hierarchies. We hope that our work will inspire further research on tabular data reasoning and the development of more robust models. The code and data are available at https://github.com/cspzyy/RealHiTBench.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on complex hierarchical table analysis
Addressing lack of realistic benchmarks for tabular data
Improving table hierarchy perception in language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces RealHiTBench for complex tabular data evaluation
Develops TreeThinker for hierarchical table reasoning
Supports multiple input formats like LaTeX, HTML, PNG
🔎 Similar Papers
No similar papers found.
P
Pengzuo Wu
Zhejiang University
Y
Yuhang Yang
Zhejiang University
G
Guangcheng Zhu
Zhejiang University
C
Chao Ye
Zhejiang University
Hong Gu
Hong Gu
National Institute on Drug Abuse, NIH
functional MRIfunctional connectivitydrug addiction
X
Xu Lu
Zhejiang University
Ruixuan Xiao
Ruixuan Xiao
Zhejiang Univeristy
Machine LearningNatural Language ProcessingLLM
B
Bowen Bao
Zhejiang University
Y
Yijing He
Zhejiang University
L
Liangyu Zha
Institute of Computing Innovation, Zhejiang University
Wentao Ye
Wentao Ye
Zhejiang University, Ant Research
LLMsMachine LearningMultimodality
J
Junbo Zhao
Zhejiang University
Haobo Wang
Haobo Wang
Zhejiang University
Machine Learning