π€ AI Summary
To address the challenges of multi-step, multi-type reasoning in complex table-based question answering (TQA), this paper proposes a fine-tuning-free, closed-model-free multi-agent collaborative framework. Methodologically, it decouples planning from code generation, incorporating tool-augmented online planning, table-aware code generation, execution-based verification, and an asynchronous inter-agent coordination protocol. Our key contributions are: (1) the first dynamic tool invocation mechanism tailored for TQA, and (2) a lightweight, open-LLM-driven multi-agent architecture that significantly enhances reproducibility and accessibility. Experimental results demonstrate state-of-the-art performance across four major TQA benchmarks: our approach achieves superior scores on three evaluation metrics and matches GPT-4 on twoβwhile exclusively leveraging unmodified open-source LLMs and requiring no task-specific training data.
π Abstract
Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.