🤖 AI Summary
Large language models exhibit limited capability in multi-step numerical and symbolic reasoning over complex tabular data. This paper proposes a reinforcement learning framework grounded in executable spreadsheet formulas, wherein models autonomously derive formulas guided by answer-correctness rewards to achieve end-to-end table question answering. Methodologically, we introduce the first formula-driven unsupervised RL training paradigm; incorporate syntax-constrained formula decoding, reward shaping, structured table encoding, and an executable formula validator. We further provide theoretical convergence guarantees. Evaluated on seven mainstream tabular reasoning benchmarks, our approach achieves substantial performance gains—particularly in multi-step reasoning tasks—where accuracy improves markedly. Notably, our 7B-parameter model is the first to comprehensively outperform O1 across benchmarks, marking a breakthrough in general-purpose symbolic reasoning over tables.
📝 Abstract
Tables are a fundamental structure for organizing and analyzing data, making effective table understanding a critical capability for intelligent systems. While large language models (LMs) demonstrate strong general reasoning abilities, they continue to struggle with accurate numerical or symbolic reasoning over tabular data, especially in complex scenarios. Spreadsheet formulas provide a powerful and expressive medium for representing executable symbolic operations, encoding rich reasoning patterns that remain largely underutilized. In this paper, we propose Formula Tuning (Fortune), a reinforcement learning (RL) framework that trains LMs to generate executable spreadsheet formulas for question answering over general tabular data. Formula Tuning reduces the reliance on supervised formula annotations by using binary answer correctness as a reward signal, guiding the model to learn formula derivation through reasoning. We provide a theoretical analysis of its advantages and demonstrate its effectiveness through extensive experiments on seven table reasoning benchmarks. Formula Tuning substantially enhances LM performance, particularly on multi-step numerical and symbolic reasoning tasks, enabling a 7B model to outperform O1 on table understanding. This highlights the potential of formula-driven RL to advance symbolic table reasoning in LMs.