π€ AI Summary
This work addresses the challenge that large language models often disrupt the two-dimensional structure of tables during linearization, leading to inaccurate numerical reasoning and limited interpretability. To mitigate this, the authors propose a multi-step code generation framework that, for the first time, incorporates explicit natural language annotations during program synthesis. By decomposing table-based question answering into annotated, executable Python programs, the approach enhances the modelβs understanding of tabular structure, improves numerical accuracy, and increases reasoning transparency. Built upon Qwen2.5-Coder-7B-Instruct and integrating multi-line annotated program synthesis with a lightweight answer selection mechanism, the method achieves 70.9% accuracy on WikiTableQuestions. When further combined with an end-to-end model, performance improves to 84.3%, substantially outperforming the Repanda baseline (67.6%).
π Abstract
Table Question Answering (TableQA) poses a significant challenge for large language models (LLMs) because conventional linearization of tables often disrupts the two-dimensional relationships intrinsic to structured data. Existing methods, which depend on end-to-end answer generation or single-line program queries, typically exhibit limited numerical accuracy and reduced interpretability. This work introduces a commented, step-by-step code-generation framework that incorporates explicit reasoning into the Python program-generation process. The approach decomposes TableQA reasoning into multi-line executable programs with concise natural language comments, thereby promoting clearer reasoning and increasing the likelihood of generating correct code. On the WikiTableQuestions benchmark, the proposed method achieves 70.9\% accuracy using Qwen2.5-Coder-7B-Instruct, surpassing the Repanda baseline (67.6\%). Integrating the proposed framework with a robust end-to-end TableQA model via a lightweight answer-selection mechanism yields further improvements. This combined approach achieves up to 84.3\% accuracy on the WikiTableQuestions benchmark.