🤖 AI Summary
Complex table-based question answering (QA) requires multi-step reasoning, yet existing methods lack long-term planning and fail to model logical dependencies among steps, leading to redundant information accumulation and insufficient global reasoning.
Method: This paper proposes a long-horizon reasoning planning mechanism for table understanding, decomposing questions into synergistic short- and long-term goals. A constraint-aware reasoning planner explicitly models inter-step dependencies, mitigating redundancy and overcoming the limited global planning capability of conventional chain-of-thought approaches. An end-to-end framework built upon large language models unifies table QA and fact verification.
Contribution/Results: The method achieves state-of-the-art performance on WikiTableQuestions and TabFact, significantly outperforming strong baselines. Results empirically validate that structured, long-horizon planning substantially enhances complex reasoning over tabular data.
📝 Abstract
Table understanding is key to addressing challenging downstream tasks such as table-based question answering and fact verification. Recent works have focused on leveraging Chain-of-Thought and question decomposition to solve complex questions requiring multiple operations on tables. However, these methods often suffer from a lack of explicit long-term planning and weak inter-step connections, leading to miss constraints within questions. In this paper, we propose leveraging the long-term planning capabilities of large language models (LLMs) to enhance table understanding. Our approach enables the execution of a long-term plan, where the steps are tightly interconnected and serve the ultimate goal, an aspect that methods based on Chain-of-Thought and question decomposition lack. In addition, our method effectively minimizes the inclusion of unnecessary details in the process of solving the next short-term goals, a limitation of methods based on Chain-of-Thought. Extensive experiments demonstrate that our method outperforms strong baselines and achieves state-of-the-art performance on WikiTableQuestions and TabFact datasets.