🤖 AI Summary
Current large language model (LLM)-based agents in social simulation are often mistakenly assumed to spontaneously reproduce authentic human collective behaviors, despite lacking a scientific foundation in behavioral validity and environmental interaction mechanisms. This work critically exposes the fundamental gap between role-playing plausibility and genuine human behavioral efficacy, reframing social simulation as a Markov game that explicitly incorporates environmental participation. It introduces explicit scheduling and information exposure mechanisms, emphasizing the pivotal roles of initial conditions, interaction protocols, and environmental dynamics in shaping emergent collective behavior. By doing so, the proposed framework systematically enhances the scientific rigor, auditability, and reproducibility of LLM-based multi-agent simulations, offering actionable design, evaluation, and interpretability guidelines for AI-driven social modeling.
📝 Abstract
Recent advances in large language models (LLMs) have spurred growing interest in using LLM-integrated agents for social simulation, often under the implicit assumption that realistic population dynamics will emerge once role-specified agents are placed in a networked multi-agent setting. This position paper argues that LLM-based agents alone are not (yet) sufficient for social simulation. We attribute this over-optimism to a systematic mismatch between what current agent pipelines are typically optimized and validated to produce and what simulation-as-science requires. Concretely, role-playing plausibility does not imply faithful human behavioral validity; collective outcomes are frequently mediated by agent-environment co-dynamics rather than agent-agent messaging alone; and results can be dominated by interaction protocols, scheduling, and initial information priors. To make these underlying mechanisms explicit and auditable, we propose a unified formulation of AI agent-based social simulation as an environment-involved Markov game with explicit exposure and scheduling mechanisms, from which we derive concrete actions for design, evaluation, and interpretation.