🤖 AI Summary
Existing tabular prediction models struggle to effectively leverage relevant in-table rows as few-shot evidence and are highly susceptible to noise, leading to unstable performance. This work proposes TabSieve, a novel framework that decouples and jointly optimizes evidence selection and prediction for the first time: it explicitly selects a small set of informative table rows as auditable evidence and subsequently makes predictions based on this curated evidence. The approach introduces a TAB-GRPO reinforcement learning strategy coupled with a dynamic task advantage balancing mechanism and is trained on TabSieve-SFT-40K, a synthetically generated high-quality supervised fine-tuning dataset supporting joint classification and regression tasks. Evaluated across 75 classification and 52 regression benchmarks, TabSieve achieves average accuracy gains of 2.92% and regression performance improvements of 4.45%, significantly enhancing both robustness and interpretability.
📝 Abstract
Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models do not consistently leverage relevant rows, and noisy context can degrade performance. To address this challenge, we propose TabSieve, a select-then-predict framework that makes evidence usage explicit and auditable. Given a table and a query row, TabSieve first selects a small set of informative rows as evidence and then predicts the missing target conditioned on the selected evidence. To enable this capability, we construct TabSieve-SFT-40K by synthesizing high-quality reasoning trajectories from 331 real tables using a strong teacher model with strict filtering. Furthermore, we introduce TAB-GRPO, a reinforcement learning recipe that jointly optimizes evidence selection and prediction correctness with separate rewards, and stabilizes mixed regression and classification training via dynamic task-advantage balancing. Experiments on a held-out benchmark of 75 classification and 52 regression tables show that TabSieve consistently improves performance across shot budgets, with average gains of 2.92% on classification and 4.45% on regression over the second-best baseline. Further analysis indicates that TabSieve concentrates more attention on the selected evidence, which improves robustness to noisy context.