🤖 AI Summary
This work addresses key limitations of existing table reasoning methods under the single-turn paradigm—namely, context overflow, insufficient numerical sensitivity, and hallucinations in large language models (LLMs). The authors propose an uncertainty-aware procedural agent that, for the first time, integrates both epistemic and aleatoric uncertainty quantification into table reasoning. By leveraging memory-guided plan pruning, confidence-driven action refinement, and dual-weighted multi-path trajectory aggregation, the approach effectively suppresses hallucinations and enhances reasoning accuracy. Built upon a lightweight LLM and combining supervised fine-tuning, RAPO-based reinforcement learning, memory retrieval, and token-level probability monitoring, the method significantly outperforms current open- and closed-source models across multiple table reasoning benchmarks, demonstrating the efficacy of integrating autonomous training with uncertainty-aware mechanisms.
📝 Abstract
Table reasoning requires models to jointly perform semantic understanding and precise numerical operations. Most existing methods rely on a single-turn reasoning paradigm over tables which suffers from context overflow and weak numerical sensitivity. To address these limitations, we previously proposed TableMind as a tuning-based autonomous programmatic agent that simulates human-like interaction within a lightweight large language model (LLM). TableMind internalizes planning, action, and reflection through a two-stage training strategy involving supervised fine-tuning (SFT) on filtered high-quality data and reinforcement learning (RL) via a multi-perspective reward and the Rank-Aware Policy Optimization (RAPO) algorithm. While TableMind establishes a solid foundation for programmatic agents, the inherent stochasticity of LLMs remains a critical challenge that leads to hallucinations. In this paper, we extend this foundation to TableMind++ by introducing a novel uncertainty-aware inference framework to mitigate hallucinations. Specifically, we propose memory-guided plan pruning to retrieve historical trajectories for validating and filtering out logically flawed plans to address epistemic uncertainty. To ensure execution precision, we introduce confidence-based action refinement which monitors token-level probabilities to detect and self-correct syntactic noise for aleatoric uncertainty mitigation. Finally, we employ dual-weighted trajectory aggregation to synthesize a robust consensus from multiple reasoning paths. Extensive experiments on diverse benchmarks demonstrate that TableMind++ consistently outperforms previous baselines and proprietary models to validate the effectiveness of integrating autonomous training with uncertainty quantification. Our code is available.