🤖 AI Summary
Large language models (LLMs) suffer from limited performance in complex tabular numerical reasoning due to intricate queries, data noise, and insufficient numerical reasoning capabilities. To address this, we propose a stepwise reasoning framework: (1) a query decomposer disentangles multi-hop logical dependencies; (2) a table cleaning module robustly filters noise and inconsistencies; and (3) a Program-of-Thought (PoT)–based reasoner generates executable Python code to ensure precise arithmetic computation. Our approach innovatively integrates problem decomposition, noise-robust preprocessing, and programmatic reasoning. We further introduce CalTab151—a bias-free, leakage-resistant benchmark comprising 151 challenging tabular reasoning instances requiring advanced numerical operations. Experimental results demonstrate state-of-the-art accuracy improvements of +8.79% on TAT-QA, +6.08% on TableBench, and +19.87% on CalTab151, significantly enhancing LLMs’ robustness and precision in noisy tabular reasoning scenarios.
📝 Abstract
Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning. Data and code are available upon request.