🤖 AI Summary
This study investigates the inter-run instability of multi-large language model (LLM) collaborative systems under theoretically deterministic zero-temperature (T=0) settings, where the underlying mechanisms remain unclear. Modeling a five-agent LLM committee as a stochastic dynamical system, the authors quantify preference trajectory divergence using empirical Lyapunov exponents and analyze stability through factorial experiments, role ablation, and memory window manipulation. They reveal for the first time that both role differentiation and model heterogeneity can independently induce chaotic dynamics, exhibiting non-additive interaction effects. On the HL-01 benchmark, the heterogeneous, role-free configuration shows the strongest divergence (λ̂ = 0.0947), while introducing a chairperson role or shortening the memory window significantly suppresses it. Critically, all scenarios yield positive Lyapunov exponents, indicating pervasive chaos and underscoring stability auditing as a core design principle in multi-LLM governance.
📝 Abstract
Collective AI systems increasingly rely on multi-LLM deliberation, but their stability under repeated execution remains poorly characterized. We model five-agent LLM committees as random dynamical systems and quantify inter-run sensitivity using an empirical Lyapunov exponent ($\hatλ$) derived from trajectory divergence in committee mean preferences. Across 12 policy scenarios, a factorial design at $T=0$ identifies two independent routes to instability: role differentiation in homogeneous committees and model heterogeneity in no-role committees. Critically, these effects appear even in the $T=0$ regime where practitioners often expect deterministic behavior. In the HL-01 benchmark, both routes produce elevated divergence ($\hatλ=0.0541$ and $0.0947$, respectively), while homogeneous no-role committees also remain in a positive-divergence regime ($\hatλ=0.0221$). The combined mixed+roles condition is less unstable than mixed+no-role ($\hatλ=0.0519$ vs $0.0947$), showing non-additive interaction. Mechanistically, Chair-role ablation reduces $\hatλ$ most strongly, and targeted protocol variants that shorten memory windows further attenuate divergence. These results support stability auditing as a core design requirement for multi-LLM governance systems.