Table-r1: Self-supervised and Reinforcement Learning for Program-based Table Reasoning in Small Language Models

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Small language models (SLMs) struggle with generalizing across diverse table layouts and generating consistent programs for procedural table reasoning (P-TR). Method: We propose (1) a novel layout-transformation inference self-supervised task to enhance structural awareness of tables; (2) Group Relative PPO—a hybrid reinforcement learning algorithm enabling dynamic coordination and fallback between program generation and natural-language reasoning; and (3) tight integration of program synthesis with fine-grained table structure modeling. Results: Evaluated on four mainstream table reasoning benchmarks, our approach improves average accuracy by over 15% over LLaMA-8B, consistently outperforming existing SLM-based methods and matching the performance of leading large language models. It significantly enhances robustness and interpretability—particularly for numerically intensive table question answering—while preserving computational efficiency.

Technology Category

Application Category

📝 Abstract

Table reasoning (TR) requires structured reasoning over semi-structured tabular data and remains challenging, particularly for small language models (SLMs, e.g., LLaMA-8B) due to their limited capacity compared to large LMs (LLMs, e.g., GPT-4o). To narrow this gap, we explore program-based TR (P-TR), which circumvents key limitations of text-based TR (T-TR), notably in numerical reasoning, by generating executable programs. However, applying P-TR to SLMs introduces two challenges: (i) vulnerability to heterogeneity in table layouts, and (ii) inconsistency in reasoning due to limited code generation capability. We propose Table-r1, a two-stage P-TR method designed for SLMs. Stage 1 introduces an innovative self-supervised learning task, Layout Transformation Inference, to improve tabular layout generalization from a programmatic view. Stage 2 adopts a mix-paradigm variant of Group Relative Policy Optimization, enhancing P-TR consistency while allowing dynamic fallback to T-TR when needed. Experiments on four TR benchmarks demonstrate that Table-r1 outperforms all SLM-based methods, achieving at least a 15% accuracy improvement over the base model (LLaMA-8B) across all datasets and reaching performance competitive with LLMs.

Problem

Research questions and friction points this paper is trying to address.

Improving table reasoning in small language models

Addressing layout heterogeneity in program-based reasoning

Enhancing consistency in code generation for tables

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Layout Transformation Inference task

Mix-paradigm Group Relative Policy Optimization

Dynamic fallback to text-based reasoning

🔎 Similar Papers

ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context