🤖 AI Summary
To address the weak generalization and poor robustness of supervised fine-tuning in tabular reasoning tasks—including table-based question answering, fact verification, and text-to-SQL—we propose the first reinforcement learning (RL)-based unified framework for tabular reasoning, introducing Proximal Policy Optimization (PPO) to this domain. Methodologically, we design a lightweight, rule-driven, structure-aware reward mechanism that integrates table structure preprocessing with multi-task prompt engineering, enabling joint training across tasks and emergent capability transfer. Evaluated on multiple benchmarks—including BIRD and WikiSQL—our approach achieves state-of-the-art performance: a 7B-parameter model attains 68.3% text-to-SQL accuracy on the BIRD dev set, outperforming Claude-3.7-Sonnet by 4.0% in overall performance. The framework significantly enhances model generalization, robustness to distributional shifts, and cross-task transferability.
📝 Abstract
Table reasoning, encompassing tasks such as table question answering, fact verification, and text-to-SQL, requires precise understanding of structured tabular data, coupled with numerical computation and code manipulation for effective inference. Supervised fine-tuning (SFT) approaches have achieved notable success but often struggle with generalization and robustness due to biases inherent in imitative learning. We introduce Reasoning-Table, the first application of reinforcement learning (RL) to table reasoning, achieving state-of-the-art performance. Through rigorous data preprocessing, reward design, and tailored training strategies, our method leverages simple rule-based outcome rewards to outperform SFT across multiple benchmarks. Unified training across diverse tasks enables Reasoning-Table to emerge as a robust table reasoning large language model, surpassing larger proprietary models like Claude-3.7-Sonnet by 4.0% on table reasoning benchmarks. The approach also achieves excellent performance on text-to-SQL tasks, reaching 68.3% performance on the BIRD dev dataset with a 7B model. Further experiments demonstrate that Reasoning-Table enhances the model's generalization capabilities and robustness.