Mitigating Social Desirability Bias in Random Silicon Sampling

📅 2025-12-27

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study addresses social desirability bias (SDB)—a pervasive distortion in large language models (LLMs) during “silicon-based surveying,” where models simulate human responses to sensitive questions. We propose a psychology-inspired prompting framework. Systematic experiments reveal that neutral third-person restatement significantly reduces distributional divergence between model outputs and representative human survey data (ANES), whereas common meta-instructions—such as reverse coding, priming, and preamble—prove ineffective, challenging prevailing prompting intuitions. Distribution alignment is rigorously evaluated using Jensen–Shannon divergence and bootstrap confidence intervals. Empirical results across Llama-3.1 and GPT-4.1-mini show the restatement strategy reduces JS divergence by up to 42%, markedly attenuating response concentration toward socially acceptable endpoints. To our knowledge, this is the first approach to achieve targeted, verifiable, and reproducible SDB mitigation in silicon-based surveying, thereby enhancing representativeness and external validity.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used to simulate population responses, a method known as ``Silicon Sampling''. However, responses to socially sensitive questions frequently exhibit Social Desirability Bias (SDB), diverging from real human data toward socially acceptable answers. Existing studies on social desirability bias in LLM-based sampling remain limited. In this work, we investigate whether minimal, psychologically grounded prompt wording can mitigate this bias and improve alignment between silicon and human samples. We conducted a study using data from the American National Election Study (ANES) on three LLMs from two model families: the open-source Llama-3.1 series and GPT-4.1-mini. We first replicate a baseline silicon sampling study, confirming the persistent Social Desirability Bias. We then test four prompt-based mitigation methods: emph{reformulated} (neutral, third-person phrasing), emph{reverse-coded} (semantic inversion), and two meta-instructions, emph{priming} and emph{preamble}, respectively encouraging analytics and sincerity. Alignment with ANES is evaluated using Jensen-Shannon Divergence with bootstrap confidence intervals. Our results demonstrate that reformulated prompts most effectively improve alignment by reducing distribution concentration on socially acceptable answers and achieving distributions closer to ANES. Reverse-coding produced mixed results across eligible items, while the Priming and Preamble encouraged response uniformity and showed no systematic benefit for bias mitigation. Our findings validate the efficacy of prompt-based framing controls in mitigating inherent Social Desirability Bias in LLMs, providing a practical path toward more representative silicon samples.

Problem

Research questions and friction points this paper is trying to address.

Mitigates social desirability bias in LLM-based population simulations.

Improves alignment between silicon samples and real human survey data.

Tests prompt-based methods to reduce bias in sensitive question responses.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using neutral third-person phrasing in prompts

Applying semantic inversion to reverse-code questions

Employing meta-instructions for analytic or sincere responses

🔎 Similar Papers

A Theory of LLM Sampling: Part Descriptive and Part Prescriptive