Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the high cost and scalability limitations of traditional survey methodologies, this paper formally introduces the novel task of “virtual respondent simulation,” proposing two paradigms: Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS). We establish LLM-S³, a benchmark platform comprising 11 real-world datasets. Using models including GPT-3.5/4 Turbo and LLaMA 3.0/3.1-8B, we systematically evaluate generated responses for demographic consistency under zero-shot and context-augmented prompting strategies. Our analysis uncovers critical mechanisms by which prompt design governs simulation fidelity and identifies prevalent failure modes. Empirical validation across four sociological domains demonstrates the feasibility and promise of large language models (LLMs) in enabling low-cost, large-scale survey simulation. This work provides social scientists and policymakers with a scalable, economically efficient tool for survey research and impact assessment.

Technology Category

Application Category

📝 Abstract

Questionnaire-based surveys are foundational to social science research and public policymaking, yet traditional survey methods remain costly, time-consuming, and often limited in scale. This paper explores a new paradigm: simulating virtual survey respondents using Large Language Models (LLMs). We introduce two novel simulation settings, namely Partial Attribute Simulation (PAS) and Full Attribute Simulation (FAS), to systematically evaluate the ability of LLMs to generate accurate and demographically coherent responses. In PAS, the model predicts missing attributes based on partial respondent profiles, whereas FAS involves generating complete synthetic datasets under both zero-context and context-enhanced conditions. We curate a comprehensive benchmark suite, LLM-S^3 (Large Language Model-based Sociodemographic Survey Simulation), that spans 11 real-world public datasets across four sociological domains. Our evaluation of multiple mainstream LLMs (GPT-3.5/4 Turbo, LLaMA 3.0/3.1-8B) reveals consistent trends in prediction performance, highlights failure modes, and demonstrates how context and prompt design impact simulation fidelity. This work establishes a rigorous foundation for LLM-driven survey simulations, offering scalable and cost-effective tools for sociological research and policy evaluation. Our code and dataset are available at: https://github.com/dart-lab-research/LLM-S-Cube-Benchmark

Problem

Research questions and friction points this paper is trying to address.

Simulating virtual survey respondents using Large Language Models

Evaluating LLM ability to generate accurate demographic responses

Creating scalable tools for sociological research and policy evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates virtual respondents using LLMs

Introduces Partial and Full Attribute Simulation settings

Creates benchmark suite LLM-S3 for evaluation

🔎 Similar Papers

Survey Respondent Surrogates? Probing Objective and Subjective Silicon Population