Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small language models (SLMs) struggle with generalizing across diverse table layouts and generating consistent programs for procedural table reasoning (P-TR). Method: We propose (1) a novel layout-transformation inference self-supervised task to enhance structural awareness of tables; (2) Group Relative PPO—a hybrid reinforcement learning algorithm enabling dynamic coordination and fallback between program generation and natural-language reasoning; and (3) tight integration of program synthesis with fine-grained table structure modeling. Results: Evaluated on four mainstream table reasoning benchmarks, our approach improves average accuracy by over 15% over LLaMA-8B, consistently outperforming existing SLM-based methods and matching the performance of leading large language models. It significantly enhances robustness and interpretability—particularly for numerically intensive table question answering—while preserving computational efficiency.

Technology Category

Application Category

📝 Abstract
Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.
Problem

Research questions and friction points this paper is trying to address.

Improving table reasoning in small language models
Addressing layout heterogeneity in program-based reasoning
Enhancing consistency in code generation for tables
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Layout Transformation Inference task
Mix-paradigm Group Relative Policy Optimization
Dynamic fallback to text-based reasoning
🔎 Similar Papers
2024-03-04Conference on Empirical Methods in Natural Language ProcessingCitations: 4
Rihui Jin
Rihui Jin
Southeast University
Table QAGNNLLM Reasoning
Z
Zheyu Xin
Southeast University, Nanjing, China
X
Xing Xie
Southeast University, Nanjing, China
Guilin Qi
Guilin Qi
Southeast University
Artificial Intelligenceontology
Y
Yongrui Chen
Southeast University, Nanjing, China
Xinbang Dai
Xinbang Dai
Southeast university
Question AnsweringLLM
T
Tongtong Wu
Monash University, Australia
G
Gholamreza Haffari
Monash University, Australia