Probing How Scalable Table Data Enhances General Long-Context Reasoning

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates which data types effectively enhance the long-context reasoning capabilities of large language models. It reveals, for the first time, that periodic and non-decaying dependencies inherent in structured tabular data play a critical role in long-range reasoning. Building on this insight, the work proposes a verifiable, diverse, and scalable method for synthesizing tabular data, integrated with a mutual information–based dependency analysis and a reinforcement learning training framework. The proposed approach achieves an average improvement of 8.24% across multiple long-context benchmarks and demonstrates a consistent gain of 8.06% on out-of-domain tasks, significantly boosting the model’s long-context reasoning performance.

Technology Category

Application Category

📝 Abstract
As real-world tasks grow increasingly complex, long-context reasoning has become a core capability for Large Language Models (LLMs). However, few studies explore which data types are effective for long-context reasoning and why. We find that structured table data with periodic structures shows strong potential for long-context reasoning. Motivated by this observation, we mathematically analyze tabular dependency structures using mutual information, revealing periodic non-vanishing dependencies in table data. Furthermore, we systematically analyze the capabilities of structured table data, conduct relevant scaling experiments, and validate its underlying mechanisms for enhancing long-context reasoning, yielding several meaningful insights. Leveraging these insights, we propose a simple yet scalable pipeline(TableLong) for synthesizing high-quality, diverse, and verifiable structured table data to boost long-context reasoning via RL. Extensive experimental results demonstrate that table data significantly enhances the long-context reasoning capability of LLMs across multiple long-context benchmarks (+8.24\% on average), and even improves performance on out-of-domain benchmarks (+8.06\% on average). We hope that our insights provide practical guidance for effective post-training data to enhance long-context reasoning in LLMs.
Problem

Research questions and friction points this paper is trying to address.

long-context reasoning
structured table data
Large Language Models
data types
reasoning capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

structured table data
long-context reasoning
mutual information
scaling law
reinforcement learning
🔎 Similar Papers
No similar papers found.
H
Huaibing Xie
Large Language Model Department, Tencent
G
Guoliang Zhao
Large Language Model Department, Tencent; Xi’an Jiaotong University, Xi’an, China
Yang Liu
Yang Liu
Microsoft
natural language processingtext summarizationtext generation
Shihan Dou
Shihan Dou
Fudan University
LLMsCode LMsRLAlignment
S
Siming Huang
Fudan University, Shanghai, China
Y
Yanling Xiao
Large Language Model Department, Tencent
Shaolei Wang
Shaolei Wang
Unknown affiliation
NLPmachine learning
Yiting Liu
Yiting Liu
University of California San Diego
EDAVLSI Physical DesignMachine LearningData Privacy Protection
C
Cheng Zhang
Large Language Model Department, Tencent
S
Shaofan Liu
Large Language Model Department, Tencent
P
Pluto Zhou
Large Language Model Department, Tencent