ITUNLP at SemEval-2025 Task 8: Question-Answering over Tabular Data: A Zero-Shot Approach using LLM-Driven Code Generation

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses SemEval-2025 Task 8 (DataBench), a cross-domain table question answering (TQA) challenge comprising two subtasks: DataBench QA and DataBench Lite QA. We propose a zero-shot, large language model (LLM)-driven approach for generating executable Pandas code to answer natural language questions over tabular data—without any model fine-tuning. Our method leverages carefully engineered prompts to guide open-source LLMs in synthesizing robust, semantically accurate Python code that parses complex table structures and computes precise answers. The core contribution lies in reframing TQA as a program synthesis problem under strict zero-shot conditions, enabling high-fidelity code generation without domain-specific adaptation. Experimental results show our system ranks 8th on Subtask I and 6th on Subtask II—outperforming all published baselines and 30 open-source model systems. This demonstrates the effectiveness and strong generalization capability of the code-generation paradigm for cross-domain tabular reasoning.

Technology Category

Application Category

📝 Abstract

This paper presents our system for SemEval-2025 Task 8: DataBench, Question-Answering over Tabular Data. The primary objective of this task is to perform question answering on given tabular datasets from diverse domains under two subtasks: DataBench QA (Subtask I) and DataBench Lite QA (Subtask II). To tackle both subtasks, we developed a zero-shot solution with a particular emphasis on leveraging Large Language Model (LLM)-based code generation. Specifically, we propose a Python code generation framework utilizing state-of-the-art open-source LLMs to generate executable Pandas code via optimized prompting strategies. Our experiments reveal that different LLMs exhibit varying levels of effectiveness in Python code generation. Additionally, results show that Python code generation achieves superior performance in tabular question answering compared to alternative approaches. Although our ranking among zero-shot systems is unknown at the time of this paper's submission, our system achieved eighth place in Subtask I and sixth place in Subtask~II among the 30 systems that outperformed the baseline in the open-source models category.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot QA over tabular data using LLM-driven code generation

Comparing LLMs' effectiveness in generating executable Pandas code

Evaluating Python code generation for superior tabular QA performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot LLM-driven code generation

Python code framework with Pandas

Optimized prompting for tabular QA

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering