🤖 AI Summary
This paper addresses the dual challenges of declining response rates and increasing nonresponse bias in traditional public opinion surveys. To tackle these issues, we propose a high-fidelity opinion synthesis method leveraging large language models (LLMs). Our approach innovatively integrates a knowledge-augmented role-generation framework: domain-specific knowledge is injected via retrieval-augmented generation (RAG); multidimensional individual representations are constructed by combining the HEXACO personality model with demographic attributes; and personalized opinion simulation is achieved through dynamic prompt engineering and in-context learning. In experiments using the Common European Survey (CES) questionnaire, our method significantly improves alignment between LLM-generated responses and actual human answers (+23.6% consistency) and increases answer compliance by 18.4%. The proposed framework offers a scalable, cost-effective, and ecologically valid paradigm for simulating public opinion while preserving behavioral realism and representativeness.
📝 Abstract
This paper investigates the use of Large Language Models (LLMs) to synthesize public opinion data, addressing challenges in traditional survey methods like declining response rates and non-response bias. We introduce a novel technique: role creation based on knowledge injection, a form of in-context learning that leverages RAG and specified personality profiles from the HEXACO model and demographic information, and uses that for dynamically generated prompts. This method allows LLMs to simulate diverse opinions more accurately than existing prompt engineering approaches. We compare our results with pre-trained models with standard few-shot prompts. Experiments using questions from the Cooperative Election Study (CES) demonstrate that our role-creation approach significantly improves the alignment of LLM-generated opinions with real-world human survey responses, increasing answer adherence. In addition, we discuss challenges, limitations and future research directions.