🤖 AI Summary
This study investigates the statistical representativeness and bias of synthetic social data generated by large language models (LLMs) for cross-national measurement of social values. Method: We systematically evaluate outputs from six state-of-the-art LLMs—three open-source and three closed-source—against authoritative, multi-country human survey data (e.g., World Values Survey), conducting multidimensional statistical comparisons to empirically assess the relative reliability of LLM-synthesized data versus real-world sampling data, including both demographic undercoverage and response biases. Contribution/Results: Even small-scale human surveys exhibiting sampling bias yield significantly more accurate representations of social values than LLM-generated data; algorithmic biases embedded in LLMs consistently exceed those inherent in conventional surveys. The findings expose a fundamental limitation of current LLMs as substitutes for empirical data in social science research, reaffirming the irreplaceable role of rigorously collected observational data and establishing methodological boundaries for the cautious use of AI-generated data in the social sciences.
📝 Abstract
Large Language Models are being used in conversational agents that simulate human conversations and generate social studies data. While concerns about the models' biases have been raised and discussed in the literature, much about the data generated is still unknown. In this study we explore the statistical representation of social values across four countries (UK, Argentina, USA and China) for six LLMs, with equal representation for open and closed weights. By comparing machine-generated outputs with actual human survey data, we assess whether algorithmic biases in LLMs outweigh the biases inherent in real- world sampling, including demographic and response biases. Our findings suggest that, despite the logistical and financial constraints of human surveys, even a small, skewed sample of real respondents may provide more reliable insights than synthetic data produced by LLMs. These results highlight the limitations of using AI-generated text for social research and emphasize the continued importance of empirical human data collection.