Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Large language models (LLMs) excel at open-ended generation but struggle to produce stable, closed-ended survey responses required for social science applications. Method: We conduct the first systematic evaluation of eight generative approaches across ten open-source LLMs, using large-scale simulation experiments—generating 32 million responses—based on political attitude survey data. Our methodology integrates prompt engineering, constrained text generation, and response mapping techniques. Results: Constrained generation consistently outperforms alternatives across key metrics: individual prediction accuracy, subgroup representativeness, and population-level statistical consistency. In contrast, inference-time sampling strategies—including temperature tuning—fail to reliably improve response consistency. We provide a reproducible, empirically grounded practical guide, establishing the first methodological benchmark and robust technical pathway for high-fidelity survey simulation using LLMs in social science research.

Technology Category

Application Category

📝 Abstract

Many in-silico simulations of human survey responses with large language models (LLMs) focus on generating closed-ended survey responses, whereas LLMs are typically trained to generate open-ended text instead. Previous research has used a diverse range of methods for generating closed-ended survey responses with LLMs, and a standard practice remains to be identified. In this paper, we systematically investigate the impact that various Survey Response Generation Methods have on predicted survey responses. We present the results of 32 mio. simulated survey responses across 8 Survey Response Generation Methods, 4 political attitude surveys, and 10 open-weight language models. We find significant differences between the Survey Response Generation Methods in both individual-level and subpopulation-level alignment. Our results show that Restricted Generation Methods perform best overall, and that reasoning output does not consistently improve alignment. Our work underlines the significant impact that Survey Response Generation Methods have on simulated survey responses, and we develop practical recommendations on the application of Survey Response Generation Methods.

Problem

Research questions and friction points this paper is trying to address.

Standardizing methods for generating closed-ended survey responses with LLMs

Investigating how generation methods affect simulated survey response alignment

Evaluating performance of restricted generation methods versus reasoning approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Restricted Generation Methods for survey responses

Systematic comparison of eight generation techniques

Practical recommendations for in-silico simulations

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions