🤖 AI Summary
High-quality supervised data for complex table reasoning tasks in real-world scenarios remains scarce. Method: This paper proposes the JT-DA framework, which (1) constructs a high-quality, multi-step table reasoning corpus covering 34 task categories; (2) designs a four-stage workflow integrating tool invocation, prompt engineering, and process alignment to enhance interpretability and execution accuracy; and (3) introduces a data-centralized generation and workflow-driven optimization paradigm, combining LLM-based scoring and filtering, supervised fine-tuning, and reinforcement learning to train JT-DA-8B atop the open-source JT-Coder-8B model. Contribution/Results: Experiments demonstrate that JT-DA-8B significantly outperforms baseline models across diverse table question-answering benchmarks, validating both the efficacy of high-quality data curation and the structural advantages of the proposed reasoning workflow.
📝 Abstract
In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the lack of high-quality supervision in tabular reasoning scenarios, we construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables. An automatic pipeline is proposed to generate realistic multi-step analytical tasks involving reasoning patterns. The model is trained upon open-source JT-Coder-8B model, an 8B-parameter decoder-only foundation model trained from scratch. In the training stage, we leverage LLM-based scoring and workflow-aligned filtering to distill high-quality, table-centric data. Both supervised fine-tuning (SFT) and Reinforcement learning (RL) are adopted to optimize our model. Afterwards, a four-stage table reasoning workflow is proposed, including table preprocessing, table sensing, tool-integrated reasoning, and prompt engineering, to improve model interpretability and execution accuracy. Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks, demonstrating the effectiveness of data-centric generation and workflow-driven optimization.