TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

📅 2025-11-04

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Large language models (LLMs) suffer from limited performance in complex tabular numerical reasoning due to intricate queries, data noise, and insufficient numerical reasoning capabilities. To address this, we propose a stepwise reasoning framework: (1) a query decomposer disentangles multi-hop logical dependencies; (2) a table cleaning module robustly filters noise and inconsistencies; and (3) a Program-of-Thought (PoT)–based reasoner generates executable Python code to ensure precise arithmetic computation. Our approach innovatively integrates problem decomposition, noise-robust preprocessing, and programmatic reasoning. We further introduce CalTab151—a bias-free, leakage-resistant benchmark comprising 151 challenging tabular reasoning instances requiring advanced numerical operations. Experimental results demonstrate state-of-the-art accuracy improvements of +8.79% on TAT-QA, +6.08% on TableBench, and +19.87% on CalTab151, significantly enhancing LLMs’ robustness and precision in noisy tabular reasoning scenarios.

Technology Category

Application Category

📝 Abstract

Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning. Data and code are available upon request.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with complex numerical reasoning over tabular data

Tabular data analysis faces challenges from noisy data and complex queries

Existing methods underperform in complex numerical reasoning on tables

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes complex questions into simpler sub-questions

Sanitizes and filters noisy tabular data

Uses program-of-thoughts to generate executable code

🔎 Similar Papers

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization