Reasoning by Commented Code for Table Question Answering

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This work addresses the challenge that large language models often disrupt the two-dimensional structure of tables during linearization, leading to inaccurate numerical reasoning and limited interpretability. To mitigate this, the authors propose a multi-step code generation framework that, for the first time, incorporates explicit natural language annotations during program synthesis. By decomposing table-based question answering into annotated, executable Python programs, the approach enhances the model’s understanding of tabular structure, improves numerical accuracy, and increases reasoning transparency. Built upon Qwen2.5-Coder-7B-Instruct and integrating multi-line annotated program synthesis with a lightweight answer selection mechanism, the method achieves 70.9% accuracy on WikiTableQuestions. When further combined with an end-to-end model, performance improves to 84.3%, substantially outperforming the Repanda baseline (67.6%).

Technology Category

Application Category

📝 Abstract

Table Question Answering (TableQA) poses a significant challenge for large language models (LLMs) because conventional linearization of tables often disrupts the two-dimensional relationships intrinsic to structured data. Existing methods, which depend on end-to-end answer generation or single-line program queries, typically exhibit limited numerical accuracy and reduced interpretability. This work introduces a commented, step-by-step code-generation framework that incorporates explicit reasoning into the Python program-generation process. The approach decomposes TableQA reasoning into multi-line executable programs with concise natural language comments, thereby promoting clearer reasoning and increasing the likelihood of generating correct code. On the WikiTableQuestions benchmark, the proposed method achieves 70.9\% accuracy using Qwen2.5-Coder-7B-Instruct, surpassing the Repanda baseline (67.6\%). Integrating the proposed framework with a robust end-to-end TableQA model via a lightweight answer-selection mechanism yields further improvements. This combined approach achieves up to 84.3\% accuracy on the WikiTableQuestions benchmark.

Problem

Research questions and friction points this paper is trying to address.

Table Question Answering

Large Language Models

Structured Data Reasoning

Numerical Accuracy

Interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

commented code generation

table question answering

step-by-step reasoning