🤖 AI Summary
This work addresses the challenges of numerical multi-table question answering over large-scale table collections, where existing methods struggle to model complex inter-table relationships, retrieve relevant tables effectively, and generate accurate answers. To overcome these limitations, we propose the DMRAL framework, which innovatively integrates table relationship graph construction, coverage-aware joint retrieval, and a question decomposition mechanism, complemented by a sub-question-guided reasoning program generation strategy. This approach substantially enhances both retrieval coverage and answer accuracy. Experimental results on two established MTQA benchmark datasets demonstrate that DMRAL achieves an average 24% improvement in table retrieval performance and a 55% increase in answer accuracy, consistently outperforming current state-of-the-art methods across all metrics.
📝 Abstract
In this paper, we study the problem of numerical multi-table question answering (MTQA) over large-scale table collections (e.g., online data repositories). This task is essential in many analytical applications. Existing MTQA solutions, such as text-to-SQL or open-domain MTQA methods, are designed for databases and struggle when applied to large-scale table collections. The key limitations include: (1) Limited support for complex table relationships; (2) Ineffective retrieval of relevant tables at scale; (3) Inaccurate answer generation. To overcome these limitations, we propose DMRAL, a Decomposition-driven Multi-table Retrieval and Answering framework for MTQA over large-scale table collections, which consists of: (1) constructing a table relationship graph to capture complex relationships among tables; (2) Table-Aligned Question Decomposer and Coverage-Aware Retriever, which jointly enable the effective identification of relevant tables from large-scale corpora by enhancing the question decomposition quality and maximizing the question coverage of retrieved tables; and (3) Sub-question Guided Reasoner, which produces correct answers by progressively generating and refining the reasoning program based on sub-questions. Experiments on two MTQA datasets demonstrate that DMRAL significantly outperforms existing state-of-the-art MTQA methods, with an average improvement of 24% in table retrieval and 55% in answer accuracy.