PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the scarcity of high-quality human-annotated data for Table Question Answering (TQA). We propose the first LLM self-improvement framework for TQA based on synthetically generated data. Methodologically, we model chain-of-thought reasoning as a discrete state sequence, introduce a state-level scoring mechanism and process-aware contrastive sampling, and apply lightweight preference learning via a PPO variant for reinforcement fine-tuning. Using only 8,000 self-generated preference pairs, our approach achieves up to +5.0% accuracy gain on in-domain test sets and +2.4% improvement in out-of-domain generalization. It attains 5ร— faster inference than current SOTA models while matching the performance of significantly larger systems. Our core contribution is the first efficient, low-overhead, process-aware self-improvement paradigm for TQAโ€”uniquely balancing generalizability, inference efficiency, and scalability.

Technology Category

Application Category

๐Ÿ“ Abstract
Improving large language models (LLMs) with self-generated data has demonstrated success in tasks such as mathematical reasoning and code generation. Yet, no exploration has been made on table question answering (TQA), where a system answers questions based on tabular data. Addressing this gap is crucial for TQA, as effective self-improvement can boost performance without requiring costly or manually annotated data. In this work, we propose PPT, a Process-based Preference learning framework for TQA. It decomposes reasoning chains into discrete states, assigns scores to each state, and samples contrastive steps for preference learning. Experimental results show that PPT effectively improves TQA models by up to 5% on in-domain datasets and 2.4% on out-of-domain datasets, with only 8,000 preference pairs. Furthermore, the resulting models achieve competitive results compared to more complex and larger state-of-the-art TQA systems, while being five times more efficient during inference.
Problem

Research questions and friction points this paper is trying to address.

Addressing self-improvement gap in table question answering models
Proposing process-based preference learning for TQA enhancement
Boosting performance without costly manual data annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process-based preference learning framework
Decomposes reasoning chains into states
Samples contrastive steps for learning
๐Ÿ”Ž Similar Papers
No similar papers found.