TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tabular prediction models struggle to effectively leverage relevant in-table rows as few-shot evidence and are highly susceptible to noise, leading to unstable performance. This work proposes TabSieve, a novel framework that decouples and jointly optimizes evidence selection and prediction for the first time: it explicitly selects a small set of informative table rows as auditable evidence and subsequently makes predictions based on this curated evidence. The approach introduces a TAB-GRPO reinforcement learning strategy coupled with a dynamic task advantage balancing mechanism and is trained on TabSieve-SFT-40K, a synthetically generated high-quality supervised fine-tuning dataset supporting joint classification and regression tasks. Evaluated across 75 classification and 52 regression benchmarks, TabSieve achieves average accuracy gains of 2.92% and regression performance improvements of 4.45%, significantly enhancing both robustness and interpretability.

Technology Category

Application Category

📝 Abstract
Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models do not consistently leverage relevant rows, and noisy context can degrade performance. To address this challenge, we propose TabSieve, a select-then-predict framework that makes evidence usage explicit and auditable. Given a table and a query row, TabSieve first selects a small set of informative rows as evidence and then predicts the missing target conditioned on the selected evidence. To enable this capability, we construct TabSieve-SFT-40K by synthesizing high-quality reasoning trajectories from 331 real tables using a strong teacher model with strict filtering. Furthermore, we introduce TAB-GRPO, a reinforcement learning recipe that jointly optimizes evidence selection and prediction correctness with separate rewards, and stabilizes mixed regression and classification training via dynamic task-advantage balancing. Experiments on a held-out benchmark of 75 classification and 52 regression tables show that TabSieve consistently improves performance across shot budgets, with average gains of 2.92% on classification and 4.45% on regression over the second-best baseline. Further analysis indicates that TabSieve concentrates more attention on the selected evidence, which improves robustness to noisy context.
Problem

Research questions and friction points this paper is trying to address.

tabular prediction
in-table evidence
few-shot learning
noisy context
evidence selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

TabSieve
evidence selection
tabular prediction
reinforcement learning
few-shot reasoning
🔎 Similar Papers
No similar papers found.
Y
Yongyao Wang
Gaoling School of Artificial Intelligence, Renmin University of China
Z
Ziqi Miao
Gaoling School of Artificial Intelligence, Renmin University of China
L
Lu Yang
Shanghai AI Laboratory
H
Haonan Jia
The Hong Kong Polytechnic University
W
Wenting Yan
Zhejiang University
Chen Qian
Chen Qian
Renmin University of China
Large Language ModelsSafetyInterpretabilityGraph Neural Networks
L
Lijun Li
Gaoling School of Artificial Intelligence, Renmin University of China