๐ค AI Summary
This work addresses the challenge of limited coordination efficiency in multi-agent reinforcement learning caused by decentralized partial observability. It formulates action coordination as a structured information integration problem and proposes a coordination graph-based, context-dependent message-passing mechanism that leverages Graph Transformer convolutions to fuse receiver-sensitive, context-aware teammate information prior to action selection. By explicitly incorporating structured global information into decentralized decision-makingโa novel contribution to the fieldโthe method substantially enhances collaborative performance. Evaluated across five cooperative tasks, the model outperforms twelve strong baselines, and ablation studies coupled with statistical analyses confirm that its superiority stems from architectural innovation rather than increased model capacity.
๐ Abstract
Cooperative multi-agent reinforcement learning agents that act on partial local observations face a fundamental information bottleneck: the knowledge needed to select jointly optimal actions is scattered across the team, yet each agent must commit to a decision without access to its teammates' observations, intentions, or chosen actions. Existing methods either ignore this bottleneck, compress it into a scalar mixing signal, or route around it with learned communication channels. Framing action coordination as a problem of structured information integration among agents, we propose \textit{structured agent coordination via holistic information integration}, or SACHI, in which graph transformer convolutions over an inter-agent coordination graph enrich each agent's representation with receiver-sensitive, content-dependent signals from teammates prior to action selection. We evaluate SACHI across five cooperative tasks spanning spatial, communicative, and adversarial coordination challenges against twelve baselines. SACHI consistently matches or outperforms the best baseline on every task, and rigorous aggregate statistical analyses, including normalized metrics with bootstrap confidence intervals, Friedman ranking, and performance profiling, confirm that this advantage is statistically significant, robust across environments, and not attributable to increased model capacity. Parameter-matched ablations further trace the source of the gains to a single architectural property: the degree of content-dependence in the message-passing operator.