Is DeepSeek a New Voice Among LLMs in Public Opinion Simulation?

📅 2025-06-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates bias in mainstream LLMs (DeepSeek-V3/R1, Qwen2.5, GPT-4o, Llama-3.3) when simulating cross-cultural public opinion on sociopolitically salient U.S.–China issues—including abortion, capitalism, foreign aid, and individualism. Methodologically, we leverage authoritative survey data from ANES and Zuobiao to construct a persona-driven prompting framework, enabling quantitative assessment of model biases in partisan alignment, socioeconomic sensitivity, and intra-group diversity. Our contribution is the first open, multidimensional consistency and bias benchmark for cross-cultural opinion simulation with open-weight LLMs. Results reveal that DeepSeek-V3 achieves the highest fidelity on U.S. abortion attitudes (particularly among liberal personas) and select Chinese topics, yet exhibits significant deviations in modeling capitalist attitudes and low-income group perspectives. Critically, all models display pronounced intra-group response homogenization—exposing fundamental limitations in cultural adaptability and sociodemographic representativeness.

Technology Category

Application Category

📝 Abstract
This study evaluates the ability of DeepSeek, an open-source large language model (LLM), to simulate public opinions in comparison to LLMs developed by major tech companies. By comparing DeepSeek-R1 and DeepSeek-V3 with Qwen2.5, GPT-4o, and Llama-3.3 and utilizing survey data from the American National Election Studies (ANES) and the Zuobiao dataset of China, we assess these models'capacity to predict public opinions on social issues in both China and the United States, highlighting their comparative capabilities between countries. Our findings indicate that DeepSeek-V3 performs best in simulating U.S. opinions on the abortion issue compared to other topics such as climate change, gun control, immigration, and services for same-sex couples, primarily because it more accurately simulates responses when provided with Democratic or liberal personas. For Chinese samples, DeepSeek-V3 performs best in simulating opinions on foreign aid and individualism but shows limitations in modeling views on capitalism, particularly failing to capture the stances of low-income and non-college-educated individuals. It does not exhibit significant differences from other models in simulating opinions on traditionalism and the free market. Further analysis reveals that all LLMs exhibit the tendency to overgeneralize a single perspective within demographic groups, often defaulting to consistent responses within groups. These findings highlight the need to mitigate cultural and demographic biases in LLM-driven public opinion modeling, calling for approaches such as more inclusive training methodologies.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs' ability to simulate public opinions cross-culturally
Compares Chinese and U.S. models' performance on social issues
Identifies cultural and demographic biases in opinion simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-cultural LLM comparison for opinion simulation
Utilizes survey data from ANES and Zuobiao datasets
Highlights demographic biases requiring mitigation strategies
🔎 Similar Papers
2024-09-28International Conference on Computational LinguisticsCitations: 5