TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing supervised fine-tuning methods for multi-step tabular reasoning and robust code execution suffer from poor generalization and low robustness, while reinforcement learning (RL) in tabular domains faces three key challenges: scarcity of high-quality agent trajectories, heterogeneous reward signals, and catastrophic forgetting. Method: We propose the first systematic RL framework tailored for structured tabular reasoning. It introduces difficulty-stratified synthetic trajectory generation, a hybrid reward mechanism integrating domain-specific rules and criteria with process-level step-wise reward shaping, and behavior regularization combined with progressive multi-stage training to mitigate forgetting. The framework unifies supervised alignment, PPO optimization, custom reward modeling, SQL/Python closed-loop execution feedback, and structure-aware data engineering. Contribution/Results: Our approach achieves state-of-the-art performance on authoritative tabular reasoning benchmarks—significantly outperforming strong baselines—while preserving strong general language capabilities and cross-task generalization.

Technology Category

Application Category

📝 Abstract
Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learning (RL) offers a promising avenue to enhance these capabilities, yet its application in the tabular domain faces three critical hurdles: the scarcity of high-quality agentic trajectories with closed-loop code execution and environment feedback on diverse table structures, the extreme heterogeneity of feedback signals ranging from rigid SQL execution to open-ended data interpretation, and the risk of catastrophic forgetting of general knowledge during vertical specialization. To overcome these challenges and unlock advanced reasoning on complex tables, we introduce extbf{TableGPT-R1}, a specialized tabular model built on a systematic RL framework. Our approach integrates a comprehensive data engineering pipeline that synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts, a task-adaptive reward system that combines rule-based verification with a criteria-injected reward model and incorporates process-level step reward shaping with behavioral regularization, and a multi-stage training framework that progressively stabilizes reasoning before specializing in table-specific tasks. Extensive evaluations demonstrate that TableGPT-R1 achieves state-of-the-art performance on authoritative benchmarks, significantly outperforming baseline models while retaining robust general capabilities. Our model is available at https://huggingface.co/tablegpt/TableGPT-R1.
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-step reasoning in tabular data using reinforcement learning
Addresses scarcity of high-quality agentic trajectories for diverse table structures
Mitigates catastrophic forgetting during vertical specialization of tabular models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework for tabular reasoning enhancement
Task-adaptive reward system combining rule verification and reward modeling
Multi-stage training stabilizing reasoning before table specialization
🔎 Similar Papers
No similar papers found.
S
Saisai Yang
Zhejiang University Institute of Computing Innovation, Zhejiang University
Q
Qingyi Huang
Zhejiang University Institute of Computing Innovation, Zhejiang University
J
Jing Yuan
Zhejiang University Institute of Computing Innovation, Zhejiang University
L
Liangyu Zha
Zhejiang University Institute of Computing Innovation, Zhejiang University
K
Kai Tang
Zhejiang University Institute of Computing Innovation, Zhejiang University
Y
Yuhang Yang
Zhejiang University Institute of Computing Innovation, Zhejiang University
N
Ning Wang
Zhejiang University Institute of Computing Innovation, Zhejiang University
Y
Yucheng Wei
Zhejiang University Institute of Computing Innovation, Zhejiang University
Liyao Li
Liyao Li
PhD Candidate, Zhejiang University
Table ReasoningLarge Tabular Language ModelMachine Learning
Wentao Ye
Wentao Ye
Zhejiang University, Ant Research
LLMsMachine LearningMultimodality
H
Hao Chen
Zhejiang University Institute of Computing Innovation, Zhejiang University
T
Tao Zhang
Zhejiang University Institute of Computing Innovation, Zhejiang University
Junlin Zhou
Junlin Zhou
Associate Professor of Computer Science, Uninversity of Electronic Science and Technology of China
Recommender SystemData MiningBig Data Analyze
Haobo Wang
Haobo Wang
Zhejiang University
Machine Learning
G
Gang Chen
Zhejiang University Institute of Computing Innovation, Zhejiang University
J
Junbo Zhao
Zhejiang University Institute of Computing Innovation, Zhejiang University