π€ AI Summary
To address low collaboration efficiency and high communication/computation overhead in distributed multi-agent reinforcement learning (MARL), this paper proposes eQMARLβa novel quantum-enhanced MARL framework. Methodologically, it introduces Bell-state (|Ξ¨βΊβ©) quantum entanglement as a coordination medium among agents for the first time, enabling cooperative decision-making without sharing local observations via an entanglement-based decentralized critic. It further designs a quantum-classical hybrid training paradigm, wherein critics are distributedly coupled on quantum states and jointly estimate value functions through quantum measurement-driven observable estimation. Empirically, eQMARL achieves a 17.8% faster convergence and higher final task performance compared to classical decentralized and centralized baselines, while reducing centralized parameter count by 25Γ. These results demonstrate substantial alleviation of communication and computational bottlenecks inherent in MARL.
π Abstract
Collaboration is a key challenge in distributed multi-agent reinforcement learning (MARL) environments. Learning frameworks for these decentralized systems must weigh the benefits of explicit player coordination against the communication overhead and computational cost of sharing local observations and environmental data. Quantum computing has sparked a potential synergy between quantum entanglement and cooperation in multi-agent environments, which could enable more efficient distributed collaboration with minimal information sharing. This relationship is largely unexplored, however, as current state-of-the-art quantum MARL (QMARL) implementations rely on classical information sharing rather than entanglement over a quantum channel as a coordination medium. In contrast, in this paper, a novel framework dubbed entangled QMARL (eQMARL) is proposed. The proposed eQMARL is a distributed actor-critic framework that facilitates cooperation over a quantum channel and eliminates local observation sharing via a quantum entangled split critic. Introducing a quantum critic uniquely spread across the agents allows coupling of local observation encoders through entangled input qubits over a quantum channel, which requires no explicit sharing of local observations and reduces classical communication overhead. Further, agent policies are tuned through joint observation-value function estimation via joint quantum measurements, thereby reducing the centralized computational burden. Experimental results show that eQMARL with ${Psi}^{+}$ entanglement converges to a cooperative strategy up to $17.8%$ faster and with a higher overall score compared to split classical and fully centralized classical and quantum baselines. The results also show that eQMARL achieves this performance with a constant factor of $25$-times fewer centralized parameters compared to the split classical baseline.