NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

πŸ“… 2025-04-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing long-context benchmarks focus on unstructured text, overlooking fundamental challenges in ultra-long structured table understanding by large language models (LLMs). Method: We introduce NIAT, the first fine-grained benchmark for long structured tables, treating individual table cells as β€œneedles” and requiring precise localization and reasoning within tables containing tens of thousands of rows and columns. NIAT systematically exposes LLMs’ overreliance on superficial formatting cues and neglect of deep semantic relationships in long-table tasks. We propose a structure-aware table data synthesis method, coupled with long-context prompt engineering and a fine-grained question-answering evaluation framework. Contribution/Results: On NIAT, our approach significantly outperforms state-of-the-art long-context LLMs and table-specialized agents, demonstrating that structured synthetic data effectively enhances both cell-level localization accuracy and semantic modeling capability for complex, lengthy tables.

Technology Category

Application Category

πŸ“ Abstract
Processing structured tabular data, particularly lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks primarily focus on unstructured text, neglecting the challenges of long and complex structured tables. To address this gap, we introduce NeedleInATable (NIAT), a novel task that treats each table cell as a"needle"and requires the model to extract the target cell under different queries. Evaluation results of mainstream LLMs on this benchmark show they lack robust long-table comprehension, often relying on superficial correlations or shortcuts for complex table understanding tasks, revealing significant limitations in processing intricate tabular data. To this end, we propose a data synthesis method to enhance models' long-table comprehension capabilities. Experimental results show that our synthesized training data significantly enhances LLMs' performance on the NIAT task, outperforming both long-context LLMs and long-table agent methods. This work advances the evaluation of LLMs' genuine long-structured table comprehension capabilities and paves the way for progress in long-context and table understanding applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to process long structured tables
Addressing gaps in existing long-context benchmarks for tables
Enhancing models' comprehension of complex tabular data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces NeedleInATable task for long-table evaluation
Proposes data synthesis to enhance table comprehension
Outperforms existing long-context and agent methods
πŸ”Ž Similar Papers
No similar papers found.