๐ค AI Summary
This work addresses the challenge of achieving high-accuracy table question answering (TQA) using only locally executable, small-scale open-source large language models (LLMs). To this end, the authors propose Orchestra, the first multi-agent collaborative framework tailored for lightweight LLMs. Orchestra decomposes complex TQA tasks into structured subtasks through hierarchical task decomposition and coordinates multiple lightweight LLM agents to solve them collaboratively. Built upon the AgentScope framework and integrating open-source models such as Qwen, Llama, and DeepSeek, the approach achieves 72.1% accuracy on the WikiTQ benchmark using only Qwen2.5-14Bโapproaching the performance of GPT-4 (75.3%)โand establishes a new state of the art with larger open-source models, thereby significantly unlocking the potential of compact LLMs in TQA scenarios.
๐ Abstract
Given a table T in a database and a question Q in natural language, the table question answering (TQA) task aims to return an accurate answer to Q based on the content of T. Recent state-of-the-art solutions leverage large language models (LLMs) to obtain high-quality answers. However, most rely on proprietary, large-scale LLMs with costly API access, posing a significant financial barrier. This paper instead focuses on TQA with smaller, open-weight LLMs that can run on a desktop or laptop. This setting is challenging, as such LLMs typically have weaker capabilities than large proprietary models, leading to substantial performance degradation with existing methods. We observe that a key reason for this degradation is that prior approaches often require the LLM to solve a highly sophisticated task using long, complex prompts, which exceed the capabilities of small open-weight LLMs. Motivated by this observation, we present Orchestra, a multi-agent approach that unlocks the potential of accessible LLMs for high-quality, cost-effective TQA. Orchestra coordinates a group of LLM agents, each responsible for a relatively simple task, through a structured, layered workflow to solve complex TQA problems -- akin to an orchestra. By reducing the prompt complexity faced by each agent, Orchestra significantly improves output reliability. We implement Orchestra on top of AgentScope, an open-source multi-agent framework, and evaluate it on multiple TQA benchmarks using a wide range of open-weight LLMs. Experimental results show that Orchestra achieves strong performance even with small- to medium-sized models. For example, with Qwen2.5-14B, Orchestra reaches 72.1% accuracy on WikiTQ, approaching the best prior result of 75.3% achieved with GPT-4; with larger Qwen, Llama, or DeepSeek models, Orchestra outperforms all prior methods and establishes new state-of-the-art results across all benchmarks.