Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing survey analysis tools exhibit poor compatibility with large language models (LLMs) and lack evidence-based guidance for structured representation of questionnaire data. Method: We propose QASU—a novel benchmark specifically designed to evaluate LLMs’ ability to understand questionnaire structure—featuring six structured reasoning tasks: answer retrieval, respondent statistics, multi-hop inference, among others. Through systematic experiments, we quantitatively assess the impact of six data serialization formats and multiple prompting strategies; we further introduce a lightweight “self-augmented prompting” technique that explicitly injects questionnaire structural knowledge. Contribution/Results: The optimal format-prompting combination improves accuracy by up to 8.8 percentage points over the second-best configuration. Self-augmented prompting yields average gains of 3–4 percentage points on critical tasks and substantially enhances LLM robustness across diverse questionnaire formats.

Technology Category

Application Category

📝 Abstract

Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their ability to process questionnaire data or lists of questions crossed with hundreds of respondent rows remains underexplored. Current retrieval and survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are typically designed for humans in the workflow, limiting such data integration with LLM and AI-empowered automation. This gap leaves scientists, surveyors, and everyday users without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this by introducing QASU (Questionnaire Analysis and Structural Understanding), a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference, across six serialization formats and multiple prompt strategies. Experiments on contemporary LLMs show that choosing an effective format and prompt combination can improve accuracy by up to 8.8% points compared to suboptimal formats. For specific tasks, carefully adding a lightweight structural hint through self-augmented prompting can yield further improvements of 3-4% points on average. By systematically isolating format and prompting effects, our open source benchmark offers a simple yet versatile foundation for advancing both research and real-world practice in LLM-based questionnaire analysis.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking LLMs' structural understanding of questionnaire data

Evaluating serialization formats and prompts for survey analysis

Improving accuracy in multi-hop inference and answer lookup

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces QASU benchmark for questionnaire structural skills

Tests serialization formats and prompt strategies for LLMs

Uses self-augmented prompting with structural hints for improvement

🔎 Similar Papers

Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games