π€ AI Summary
Existing long-context benchmarks focus on unstructured text, overlooking fundamental challenges in ultra-long structured table understanding by large language models (LLMs). Method: We introduce NIAT, the first fine-grained benchmark for long structured tables, treating individual table cells as βneedlesβ and requiring precise localization and reasoning within tables containing tens of thousands of rows and columns. NIAT systematically exposes LLMsβ overreliance on superficial formatting cues and neglect of deep semantic relationships in long-table tasks. We propose a structure-aware table data synthesis method, coupled with long-context prompt engineering and a fine-grained question-answering evaluation framework. Contribution/Results: On NIAT, our approach significantly outperforms state-of-the-art long-context LLMs and table-specialized agents, demonstrating that structured synthetic data effectively enhances both cell-level localization accuracy and semantic modeling capability for complex, lengthy tables.
π Abstract
Processing structured tabular data, particularly lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks primarily focus on unstructured text, neglecting the challenges of long and complex structured tables. To address this gap, we introduce NeedleInATable (NIAT), a novel task that treats each table cell as a"needle"and requires the model to extract the target cell under different queries. Evaluation results of mainstream LLMs on this benchmark show they lack robust long-table comprehension, often relying on superficial correlations or shortcuts for complex table understanding tasks, revealing significant limitations in processing intricate tabular data. To this end, we propose a data synthesis method to enhance models' long-table comprehension capabilities. Experimental results show that our synthesized training data significantly enhances LLMs' performance on the NIAT task, outperforming both long-context LLMs and long-table agent methods. This work advances the evaluation of LLMs' genuine long-structured table comprehension capabilities and paves the way for progress in long-context and table understanding applications.