🤖 AI Summary
This work addresses the problem of systematic biases undermining the fidelity of large language models (LLMs) in simulating human opinion dynamics. We propose the first Bayesian analytical framework capable of quantifying three critical bias types: topic preference bias, unconditional agreement bias, and anchoring bias. Our method integrates multi-turn dialogue modeling, fine-tuning on biased opinion data, and dynamic analysis of opinion trajectories to disentangle interaction effects from bias contributions. Experimental results reveal that LLM-generated opinions rapidly converge to shared attractors; interaction effects decay over time; and biases dominate behavioral evolution—exhibiting plasticity under misinformation. Crucially, we demonstrate significant cross-model variation in bias magnitude and identify its structural influence on opinion dynamics. This study provides both a theoretical foundation and empirical benchmarks for trustworthy social simulation with LLMs.
📝 Abstract
Large Language Models are increasingly used to simulate human opinion dynamics, yet the effect of genuine interaction is often obscured by systematic biases. We present a Bayesian framework to disentangle and quantify three such biases: (i) a topic bias toward prior opinions in the training data; (ii) an agreement bias favoring agreement irrespective of the question; and (iii) an anchoring bias toward the initiating agent's stance. Applying this framework to multi-step dialogues reveals that opinion trajectories tend to quickly converge to a shared attractor, with the influence of the interaction fading over time, and the impact of biases differing between LLMs. In addition, we fine-tune an LLM on different sets of strongly opinionated statements (incl. misinformation) and demonstrate that the opinion attractor shifts correspondingly. Exposing stark differences between LLMs and providing quantitative tools to compare them to human subjects in the future, our approach highlights both chances and pitfalls in using LLMs as proxies for human behavior.