🤖 AI Summary
This study investigates the reliability limits of large language model–based multi-agent planning within a delegation-based decision-making framework. Modeling the multi-agent system as a finite acyclic decision network, the work provides the first rigorous decision-theoretic characterization of its performance gap relative to an idealized centralized Bayesian decision maker, under shared context, limited linguistic communication, and optional human oversight. By integrating stochastic experiments under communication budget constraints, posterior divergence analysis, and conditional mutual information techniques—coupled with Brier score–based error quantification—the authors derive an analytical expression for the value gap. Empirical validation confirms that this gap widens significantly as communication constraints intensify, thereby revealing a quantitative relationship between linguistic compression loss and scoring rules.
📝 Abstract
This technical note studies the reliability limits of LLM-based multi-agent planning as a delegated decision problem. We model the LLM-based multi-agent architecture as a finite acyclic decision network in which multiple stages process shared model-context information, communicate through language interfaces with limited capacity, and may invoke human review. We show that, without new exogenous signals, any delegated network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. In the common-evidence regime, this implies that optimizing over multi-agent directed acyclic graphs under a finite communication budget can be recast as choosing a budget-constrained stochastic experiment on the shared signal. We also characterize the loss induced by communication and information compression. Under proper scoring rules, the gap between the centralized Bayes value and the value after communication admits an expected posterior divergence representation, which reduces to conditional mutual information under logarithmic loss and to expected squared posterior error under the Brier score. These results characterize the fundamental reliability limits of delegated LLM planning. Experiments with LLMs on a controlled problem set further demonstrate these characterizations.