🤖 AI Summary
Current research on large language model (LLM)-based social simulation lacks a standardized architecture, leading to difficulties in reproducibility and inconsistent evaluation. This work proposes the EASE framework, which modularizes the system into four core components: environment, agents, simulation engine, and evaluation metrics. EASE establishes the first structured, research-question-driven, reproducible, and configurable platform for LLM-based multi-agent social simulation, accompanied by an open-source implementation named SiliSocS sandbox. Through three case studies, the framework systematically uncovers limitations in existing approaches and demonstrates how design choices significantly influence outcomes, thereby validating EASE’s effectiveness in enhancing the rigor, reproducibility, and extensibility of social simulation research.
📝 Abstract
LLMs are increasingly deployed to simulate social interactions, yet many of the existing simulators remain ad hoc and monolithic. This lack of architectural standardization prevents reproducible research and complicates downstream evaluation. We advance a rigorous science of LLM-based multi-agent simulation by modularizing core components into Environments, Agents, Simulation engines, and Evaluation metrics (EASE). We demonstrate the utility of EASE configuration by wrapping it in an experimental study schema for orchestrating workflows centered around answering explicit research questions in generated scenarios. We contribute SiliSocS, an open-source, research-ready Silicon Society Sandbox implementing a study-structured EASE configuration to enable highly configurable and reproducible LLM-based social simulations. Using SiliSocS and EASE, we present three case studies, showcasing the system's comprehensive assessment of existing questions, ability to dive deeper into complex questions, and elaboration of existing studies, respectively. Together, these case studies highlight the limitations of current modeling approaches and isolate the impacts of design choices on key results.