TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing supervised fine-tuning methods for multi-step tabular reasoning and robust code execution suffer from poor generalization and low robustness, while reinforcement learning (RL) in tabular domains faces three key challenges: scarcity of high-quality agent trajectories, heterogeneous reward signals, and catastrophic forgetting. Method: We propose the first systematic RL framework tailored for structured tabular reasoning. It introduces difficulty-stratified synthetic trajectory generation, a hybrid reward mechanism integrating domain-specific rules and criteria with process-level step-wise reward shaping, and behavior regularization combined with progressive multi-stage training to mitigate forgetting. The framework unifies supervised alignment, PPO optimization, custom reward modeling, SQL/Python closed-loop execution feedback, and structure-aware data engineering. Contribution/Results: Our approach achieves state-of-the-art performance on authoritative tabular reasoning benchmarks—significantly outperforming strong baselines—while preserving strong general language capabilities and cross-task generalization.

Technology Category

Application Category

📝 Abstract

Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learning (RL) offers a promising avenue to enhance these capabilities, yet its application in the tabular domain faces three critical hurdles: the scarcity of high-quality agentic trajectories with closed-loop code execution and environment feedback on diverse table structures, the extreme heterogeneity of feedback signals ranging from rigid SQL execution to open-ended data interpretation, and the risk of catastrophic forgetting of general knowledge during vertical specialization. To overcome these challenges and unlock advanced reasoning on complex tables, we introduce extbf{TableGPT-R1}, a specialized tabular model built on a systematic RL framework. Our approach integrates a comprehensive data engineering pipeline that synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts, a task-adaptive reward system that combines rule-based verification with a criteria-injected reward model and incorporates process-level step reward shaping with behavioral regularization, and a multi-stage training framework that progressively stabilizes reasoning before specializing in table-specific tasks. Extensive evaluations demonstrate that TableGPT-R1 achieves state-of-the-art performance on authoritative benchmarks, significantly outperforming baseline models while retaining robust general capabilities. Our model is available at https://huggingface.co/tablegpt/TableGPT-R1.

Problem

Research questions and friction points this paper is trying to address.

Enhances multi-step reasoning in tabular data using reinforcement learning

Addresses scarcity of high-quality agentic trajectories for diverse table structures

Mitigates catastrophic forgetting during vertical specialization of tabular models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework for tabular reasoning enhancement

Task-adaptive reward system combining rule verification and reward modeling

Multi-stage training stabilizing reasoning before table specialization

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering