NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

📅 2024-06-25

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 2

✨ Influential: 0

career value

158K/year

🤖 AI Summary

To address the performance limitations of large language models (LLMs) in symbolic reasoning over web tables—caused by structural heterogeneity and semantic inconsistency—this paper proposes modeling table normalization as a lightweight, one-time, LLM-driven preprocessing task, decoupled from downstream reasoning. Methodologically, it integrates end-to-end table value normalization, context-aware semantic alignment, and structural standardization, augmented by prompt engineering to enhance semantic coherence and logical readability. The core contribution is the first explicit formulation of normalization as a前置, reusable, and generalizable modular step. Evaluated on WikiTableQuestions and TabFact, the approach improves symbolic reasoning accuracy by 12.3% and 9.7%, respectively, demonstrating that systematic normalization delivers substantial gains in LLMs’ logical comprehension of tabular data.

Technology Category

Application Category

📝 Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing symbolic reasoning in LLMs using tabular data normalization

Addressing structural variance in web tables for better LLM performance

Improving symbolic reasoning tasks via one-time table preprocessing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizes web tables for symbolic reasoning

Uses LLMs for table preprocessing step

Improves performance on WikiTableQuestion and TabFact

🔎 Similar Papers

TableRAG: Million-Token Table Understanding with Language Models