United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections

📅 2024-08-29
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This study investigates whether synthetic samples generated by large language models (LLMs) exhibit context-dependent systematic biases in public opinion forecasting, specifically for the 2024 European Parliament elections. Method: We input individual-level demographic and attitudinal features of 26,000 real voters into GPT, Claude, and Llama models to predict vote choice, then benchmark predictions against actual election outcomes. Contribution/Results: We provide the first empirical evidence that LLM-generated synthetic samples yield significantly biased and uneven forecasts across countries—exhibiting markedly higher accuracy in Western than Eastern European member states—and that prediction error is highly sensitive to whether fine-grained political attitude cues are included in prompts. Overall predictive accuracy remains low, failing to meet fidelity requirements for high-stakes opinion modeling. These findings challenge the prevailing assumption that LLMs can substitute for traditional survey-based methods and establish critical boundaries for AI-driven social science modeling.

Technology Category

Application Category

📝 Abstract
"Synthetic samples"based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans, assuming that their training data includes information on human attitudes and behavior. However, LLM-synthetic samples might exhibit bias, for example due to training data and fine-tuning processes being unrepresentative of diverse contexts. Such biases risk reinforcing existing biases in research, policymaking, and society. Therefore, researchers need to investigate if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction. In this study, we examine to what extent LLM-based predictions of individual public opinion exhibit context-dependent biases by predicting the results of the 2024 European Parliament elections. Prompting three LLMs with individual-level background information of 26,000 eligible European voters, we ask the LLMs to predict each person's voting behavior. By comparing them to the actual results, we show that LLM-based predictions of future voting behavior largely fail, their accuracy is unequally distributed across national and linguistic contexts, and they require detailed attitudinal information in the prompt. The findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. In investigating their contextual biases, this study contributes to the understanding and mitigation of inequalities in the development of LLMs and their applications in computational social science.
Problem

Research questions and friction points this paper is trying to address.

Detecting contextual biases in LLM-based election predictions
Assessing accuracy disparities across national and linguistic contexts
Evaluating LLM applicability for public opinion forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to predict voting behavior
Assessing context-dependent biases in LLMs
Comparing LLM predictions with actual results
🔎 Similar Papers
No similar papers found.