Mitigating Social Desirability Bias in Random Silicon Sampling

📅 2025-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses social desirability bias (SDB)—a pervasive distortion in large language models (LLMs) during “silicon-based surveying,” where models simulate human responses to sensitive questions. We propose a psychology-inspired prompting framework. Systematic experiments reveal that neutral third-person restatement significantly reduces distributional divergence between model outputs and representative human survey data (ANES), whereas common meta-instructions—such as reverse coding, priming, and preamble—prove ineffective, challenging prevailing prompting intuitions. Distribution alignment is rigorously evaluated using Jensen–Shannon divergence and bootstrap confidence intervals. Empirical results across Llama-3.1 and GPT-4.1-mini show the restatement strategy reduces JS divergence by up to 42%, markedly attenuating response concentration toward socially acceptable endpoints. To our knowledge, this is the first approach to achieve targeted, verifiable, and reproducible SDB mitigation in silicon-based surveying, thereby enhancing representativeness and external validity.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''. However, responses to socially sensitive questions frequently exhibit Social Desirability Bias (SDB), diverging from real human data toward socially acceptable answers. Existing studies on social desirability bias in LLM-based sampling remain limited. In this work, we investigate whether minimal, psychologically grounded prompt wording can mitigate this bias and improve alignment between silicon and human samples. We conducted a study using data from the American National Election Study (ANES) on three LLMs from two model families: the open-source Llama-3.1 series and GPT-4.1-mini. We first replicate a baseline silicon sampling study, confirming the persistent Social Desirability Bias. We then test four prompt-based mitigation methods: emph{reformulated} (neutral, third-person phrasing), emph{reverse-coded} (semantic inversion), and two meta-instructions, emph{priming} and emph{preamble}, respectively encouraging analytics and sincerity. Alignment with ANES is evaluated using Jensen-Shannon Divergence with bootstrap confidence intervals. Our results demonstrate that reformulated prompts most effectively improve alignment by reducing distribution concentration on socially acceptable answers and achieving distributions closer to ANES. Reverse-coding produced mixed results across eligible items, while the Priming and Preamble encouraged response uniformity and showed no systematic benefit for bias mitigation. Our findings validate the efficacy of prompt-based framing controls in mitigating inherent Social Desirability Bias in LLMs, providing a practical path toward more representative silicon samples.
Problem

Research questions and friction points this paper is trying to address.

Mitigates social desirability bias in LLM-based population simulations.
Improves alignment between silicon samples and real human survey data.
Tests prompt-based methods to reduce bias in sensitive question responses.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using neutral third-person phrasing in prompts
Applying semantic inversion to reverse-code questions
Employing meta-instructions for analytic or sincere responses
🔎 Similar Papers
No similar papers found.
S
Sashank Chapala
Eindhoven University of Technology, The Netherlands
M
Maksym Mironov
Eindhoven University of Technology, The Netherlands
Songgaojun Deng
Songgaojun Deng
Eindhoven University of Technology
Machine LearningData Mining