Can GRPO Boost Complex Multimodal Table Understanding?

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing table understanding methods face bottlenecks in parsing complex structures and performing multi-step logical reasoning; reinforcement learning approaches—particularly GRPO—suffer from low initial policy accuracy and sparse, coarse-grained reward signals. This paper proposes Table-R1, a three-stage reinforcement learning framework: (1) supervised fine-tuning establishes a robust initialization; (2) and (3) introduce two novel GRPO sub-stages—perceptual alignment and prompt completion—leveraging continuous Tree Edit Distance (TEDS) similarity metrics and fine-grained residual-step rewards to mitigate initialization bias and reward sparsity. Experiments demonstrate that Qwen2-VL-7B enhanced with Table-R1 significantly outperforms state-of-the-art SFT and GRPO baselines on both internal and external benchmarks, matching GPT-4o’s performance while surpassing the larger, table-specialized Table-LLaVA 13B.

Technology Category

Application Category

📝 Abstract
Existing table understanding methods face challenges due to complex table structures and intricate logical reasoning. While supervised finetuning (SFT) dominates existing research, reinforcement learning (RL), such as Group Relative Policy Optimization (GRPO), has shown promise but struggled with low initial policy accuracy and coarse rewards in tabular contexts. In this paper, we introduce Table-R1, a three-stage RL framework that enhances multimodal table understanding through: (1) Warm-up that prompts initial perception and reasoning capabilities, (2) Perception Alignment GRPO (PA-GRPO), which employs continuous Tree-Edit-Distance Similarity (TEDS) rewards for recognizing table structures and contents, and (3) Hint-Completion GRPO (HC-GRPO), which utilizes fine-grained rewards of residual steps based on the hint-guided question. Extensive experiments demonstrate that Table-R1 can boost the model's table reasoning performance obviously on both held-in and held-out datasets, outperforming SFT and GRPO largely. Notably, Qwen2-VL-7B with Table-R1 surpasses larger specific table understanding models (e.g., Table-LLaVA 13B), even achieving comparable performance to the closed-source model GPT-4o on held-in datasets, demonstrating the efficacy of each stage of Table-R1 in overcoming initialization bottlenecks and reward sparsity, thereby advancing robust multimodal table understanding.
Problem

Research questions and friction points this paper is trying to address.

Addressing low initial policy accuracy in table understanding reinforcement learning
Overcoming coarse reward sparsity for complex multimodal table reasoning
Enhancing table structure perception and logical reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage RL framework with warm-up initialization
PA-GRPO using continuous TEDS rewards
HC-GRPO employing fine-grained hint-guided rewards
🔎 Similar Papers
No similar papers found.
X
Xiaoqiang Kang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University
S
Shengen Wu
Information Hub, Hong Kong University of Science and Technology (Guangzhou)
Zimu Wang
Zimu Wang
Tsinghua University
recommendation
Yilin Liu
Yilin Liu
Google
AI/MLWearable devicesMotion sensingHealthcare AI
X
Xiaobo Jin
School of Advanced Technology, Xi’an Jiaotong-Liverpool University
Kaizhu Huang
Kaizhu Huang
Professor, Duke Kunshan University
Generalization & RobustnessStatistical Learning ThoeryTrustworthy AI
W
Wei Wang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University
Y
Yutao Yue
Information Hub, Hong Kong University of Science and Technology (Guangzhou)
Xiaowei Huang
Xiaowei Huang
Professor of Computer Science, University of Liverpool
AI Safety and SecurityVerificationTrustworthy AIFormal MethodsExplainable AI
Q
Qiufeng Wang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University