🤖 AI Summary
Current research on LLM-driven multi-agent systems lacks controlled experimental methodologies, hindering systematic investigation of emergent behaviors. Method: This paper introduces Shachi, a novel framework that pioneers the modular decoupling of agent policies into three cognitively grounded components—configuration, memory, and tools—enabling causal, controllable experiments to isolate the impact of design choices on emergence. Built upon formal behavioral modeling and an LLM-based reasoning engine, Shachi supports rigorous ablation and intervention studies. Contribution/Results: Evaluated across ten benchmark tasks, Shachi achieves significant performance gains over baselines. Crucially, it successfully reproduces empirically observed market dynamics following the U.S. tariff shock, demonstrating strong external validity. The open-source platform facilitates reproducible, cumulative scientific inquiry into multi-agent systems, advancing principled, experiment-driven research in this domain.
📝 Abstract
The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.