JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

High-quality supervised data for complex table reasoning tasks in real-world scenarios remains scarce. Method: This paper proposes the JT-DA framework, which (1) constructs a high-quality, multi-step table reasoning corpus covering 34 task categories; (2) designs a four-stage workflow integrating tool invocation, prompt engineering, and process alignment to enhance interpretability and execution accuracy; and (3) introduces a data-centralized generation and workflow-driven optimization paradigm, combining LLM-based scoring and filtering, supervised fine-tuning, and reinforcement learning to train JT-DA-8B atop the open-source JT-Coder-8B model. Contribution/Results: Experiments demonstrate that JT-DA-8B significantly outperforms baseline models across diverse table question-answering benchmarks, validating both the efficacy of high-quality data curation and the structural advantages of the proposed reasoning workflow.

Technology Category

Application Category

📝 Abstract

In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.

Problem

Research questions and friction points this paper is trying to address.

Develops a specialized LLM for complex table reasoning tasks

Addresses lack of high-quality supervision in tabular reasoning scenarios

Proposes a four-stage workflow to improve interpretability and accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructing diverse training corpus with 34 table reasoning tasks

Using LLM-based scoring and workflow-aligned filtering for data distillation

Proposing a four-stage tool-integrated reasoning workflow for accuracy

🔎 Similar Papers

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning