🤖 AI Summary
This study addresses the lack of systematic evaluation of mechanistic interpretability in large language model (LLM)-driven agent-based modeling, which often fails to distinguish whether models merely predict outcomes or genuinely uncover the generative mechanisms underlying observed phenomena. To bridge this gap, the work integrates philosophical theories of scientific explanation and operationalizes mechanistic explanation into a four-tiered Mechanism Plausibility Scale. This framework explicitly differentiates the objectives and evaluation criteria for predictive versus explanatory models. By synthesizing LLMs, agent-based modeling, and interdisciplinary assessment methodologies, the proposed scale offers a quantifiable and structured benchmark for evaluating the explanatory power of generative agents, thereby advancing AI systems from mere prediction toward genuine interpretability.
📝 Abstract
Large language models (LLMs) can generate high-level diverse phenomena without explicitly programmed rules. This capability has led to their adoption within different agent-based models (ABMs) and social simulations. Recently, research has aim to test whether they are capable of generating different phenomena of interest, for example, human behavior on social media platforms or performance in game-theoretic scenarios.
However, capability, prediction, and explanation are different -- drawing from the philosophy of science and mechanisms literature, \textit{explanation} requires showing, to some degree, how a phenomenon is produced by related organized entities and activities. For modelers, describing the characteristics of an experiment or whether a simulation provides progress in capability (or explanation), can be difficult without being grounded in potentially distant research areas.
We integrate recent work on LLM-ABMs with contemporary philosophy of science literature and use it to operationalize a definition of `plausibility' in a four-level scale. Our scale separates the evaluation of a model's generative sufficiency (ability to reproduce a phenomenon) from its mechanistic plausibility (how the phenomenon could be produced), and clarifies the distinct roles of different models, such as predictive and explanatory ones. We introduce this as the Mechanism Plausibility Scale.