🤖 AI Summary
To address severe content hallucination, high training costs, and the trade-off between reliability and efficiency in table reasoning, this paper proposes Row-wise Traversal (RoT), a training-free inference paradigm. RoT enables controllable expansion of reasoning paths through structured row-by-row traversal, reflective refinement, and chain-of-thought compression—achieving substantial robustness gains with zero training overhead. By leveraging large language models’ intrinsic reflective capabilities, RoT avoids excessive generation over tabular text, effectively mitigating content hallucination. Evaluated on WikiTableQuestions and TableBench, RoT achieves state-of-the-art performance, improving average accuracy by 4.3% over reinforcement-learning-based LLMs (RLLMs) while significantly reducing inference token consumption. The method thus delivers both high reliability and computational efficiency without requiring fine-tuning or additional training data.
📝 Abstract
The table reasoning task, crucial for efficient data acquisition, aims to answer questions based on the given table. Recently, reasoning large language models (RLLMs) with Long Chain-of-Thought (Long CoT) significantly enhance reasoning capabilities, leading to brilliant performance on table reasoning. However, Long CoT suffers from high cost for training and exhibits low reliability due to table content hallucinations. Therefore, we propose Row-of-Thought (RoT), which performs iteratively row-wise table traversal, allowing for reasoning extension and reflection-based refinement at each traversal. Scaling reasoning length by row-wise traversal and leveraging reflection capabilities of LLMs, RoT is training-free. The sequential traversal encourages greater attention to the table, thus reducing hallucinations. Experiments show that RoT, using non-reasoning models, outperforms RLLMs by an average of 4.3%, and achieves state-of-the-art results on WikiTableQuestions and TableBench with comparable models, proving its effectiveness. Also, RoT outperforms Long CoT with fewer reasoning tokens, indicating higher efficiency.