JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality supervised data for complex table reasoning tasks in real-world scenarios remains scarce. Method: This paper proposes the JT-DA framework, which (1) constructs a high-quality, multi-step table reasoning corpus covering 34 task categories; (2) designs a four-stage workflow integrating tool invocation, prompt engineering, and process alignment to enhance interpretability and execution accuracy; and (3) introduces a data-centralized generation and workflow-driven optimization paradigm, combining LLM-based scoring and filtering, supervised fine-tuning, and reinforcement learning to train JT-DA-8B atop the open-source JT-Coder-8B model. Contribution/Results: Experiments demonstrate that JT-DA-8B significantly outperforms baseline models across diverse table question-answering benchmarks, validating both the efficacy of high-quality data curation and the structural advantages of the proposed reasoning workflow.

Technology Category

Application Category

📝 Abstract
In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.
Problem

Research questions and friction points this paper is trying to address.

Develops a specialized LLM for complex table reasoning tasks
Addresses lack of high-quality supervision in tabular reasoning scenarios
Proposes a four-stage workflow to improve interpretability and accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructing diverse training corpus with 34 table reasoning tasks
Using LLM-based scoring and workflow-aligned filtering for data distillation
Proposing a four-stage tool-integrated reasoning workflow for accuracy
🔎 Similar Papers
No similar papers found.
C
Ce Chi
Jiutian Research, China Mobile, Beijing, China
X
Xing Wang
Jiutian Research, China Mobile, Beijing, China
Z
Zhendong Wang
Jiutian Research, China Mobile, Beijing, China
X
Xiaofan Liu
Jiutian Research, China Mobile, Beijing, China
Ce Li
Ce Li
CUMTB
Video UnderstandingBehavior AnalysisEvent Detection
Z
Zhiyan Song
Jiutian Research, China Mobile, Beijing, China
C
Chen Zhao
Jiutian Research, China Mobile, Beijing, China
K
Kexin Yang
Jiutian Research, China Mobile, Beijing, China
Boshen Shi
Boshen Shi
中移九天人工智能研究院
Graph Neural NetworksTransfer LearningTable Mining
J
Jingjing Yang
Jiutian Research, China Mobile, Beijing, China
C
Chao Deng
Jiutian Research, China Mobile, Beijing, China
Junlan Feng
Junlan Feng
Chief Scientist at China Mobile Research
Natural LanguageMachine LearningSpeech ProcessingData Mining