🤖 AI Summary
Small language models (SLMs) struggle with generalizing across diverse table layouts and generating consistent programs for procedural table reasoning (P-TR). Method: We propose (1) a novel layout-transformation inference self-supervised task to enhance structural awareness of tables; (2) Group Relative PPO—a hybrid reinforcement learning algorithm enabling dynamic coordination and fallback between program generation and natural-language reasoning; and (3) tight integration of program synthesis with fine-grained table structure modeling. Results: Evaluated on four mainstream table reasoning benchmarks, our approach improves average accuracy by over 15% over LLaMA-8B, consistently outperforming existing SLM-based methods and matching the performance of leading large language models. It significantly enhances robustness and interpretability—particularly for numerically intensive table question answering—while preserving computational efficiency.
📝 Abstract
Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.