🤖 AI Summary
Existing large language models (LLMs) suffer from insufficient reasoning depth and unstable inference processes in tabular reasoning. To address these issues, we propose STaR, a cognition-inspired tabular reasoning framework that implements human-like iterative reasoning via a “slow-thinking” mechanism. First, we design a two-stage difficulty-aware reinforcement learning strategy to guide stepwise decomposition of complex tables. Second, we introduce trajectory-level uncertainty quantification—integrating token-level confidence estimation with answer consistency analysis—to enable reliable selection of reasoning paths. STaR significantly improves both reasoning depth and stability, achieving state-of-the-art performance across multiple tabular question-answering benchmarks and demonstrating strong cross-domain generalization. Our core contribution lies in deeply embedding uncertainty modeling into the reasoning process, thereby enabling, for the first time in tabular settings, interpretable and verifiable progressive cognitive reasoning.
📝 Abstract
Table reasoning with the large language models (LLMs) is a fundamental path toward building intelligent systems that can understand and analyze over structured data. While recent progress has shown promising results, they still suffer from two key limitations: (i) the reasoning processes lack the depth and iterative refinement characteristic of human cognition; and (ii) the reasoning processes exhibit instability, which compromises their reliability in downstream applications. In this work, we present STaR (slow-thinking for table reasoning), a new framework achieving cognitive table reasoning, in which LLMs are equipped with slow-thinking capabilities by explicitly modeling step-by-step thinking and uncertainty-aware inference. During training, STaR employs two-stage difficulty-aware reinforcement learning (DRL), progressively learning from simple to complex queries under a composite reward. During inference, STaR performs trajectory-level uncertainty quantification by integrating token-level confidence and answer consistency, enabling selection of more credible reasoning paths. Extensive experiments on benchmarks demonstrate that STaR achieves superior performance and enhanced reasoning stability. Moreover, strong generalization over out-of-domain datasets further demonstrates STaR's potential as a reliable and cognitively inspired solution for table reasoning with LLMs.