🤖 AI Summary
Existing large language models (LLMs) exhibit significant limitations in table structure understanding and precise numerical reasoning, hindering their performance on table-based tasks such as table question answering (TQA) and table fact verification (TFV). To address this, we propose TART—a novel three-module collaborative framework comprising a Table Formatter, Tool Generator, and Explanation Generator—enabling accurate structural parsing and interpretable chain-of-thought reasoning over tabular data. We introduce TOOLTAB, the first benchmark explicitly designed for table-tool co-training. TART integrates CodeLlama with domain-specific computational tools under a tool-calling–driven paradigm. Experiments demonstrate that CodeLlama+TART achieves 90.0% of GPT-3.5-turbo’s accuracy on TQA/TFV benchmarks, substantially outperforming chain-of-thought and other baselines. All code and data are publicly released.
📝 Abstract
Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV). To address these challenges, we introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability. We also present the TOOLTAB dataset, a new benchmark designed specifically for training LLMs in table-tool integration. Our experiments indicate that TART achieves substantial improvements over existing methods (e.g., Chain-of-Thought) by improving both the precision of data processing and the clarity of the reasoning process. Notably, TART paired with CodeLlama achieves 90.0% of the accuracy of the closed-sourced LLM GPT-3.5-turbo, highlighting its robustness in diverse real-world scenarios. All the code and data are available at https://github.com/XinyuanLu00/TART.