QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit inconsistent and unreliable responses to questionnaire-style prompts, hindering their use in survey simulation and data annotation. Method: We propose a modular, open-source framework enabling no-code construction of computer-simulated surveys and annotation experiments. It integrates structured prompt engineering, systematic prompt perturbation design, and multidimensional consistency evaluation—comparing over 40 million simulated responses against human answers. Contribution/Results: We present the first empirical evidence that questionnaire structural features and generation strategies critically impact response consistency. By optimizing prompt presentation and output constraints, we significantly reduce computational cost while improving alignment between LLM outputs and authentic human responses. The framework is publicly released and designed for non-technical users, enhancing reproducibility and scalability of virtual surveys and annotation tasks.

Technology Category

Application Category

📝 Abstract
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ($>40 $ million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
Problem

Research questions and friction points this paper is trying to address.

Enables robust evaluation of questionnaire presentation and response methods
Investigates impact of question structure on LLM-human answer alignment
Provides no-code interface for setting up LLM experiments without coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for questionnaire inference with LLMs
Evaluates question structure and response generation methods
Provides no-code interface for robust LLM experiments
🔎 Similar Papers
No similar papers found.