Quantum Advantage in Multi Agent Reinforcement Learning

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the lack of provable baselines in existing quantum multi-agent reinforcement learning, which hinders rigorous verification of quantum advantage. The authors propose a decentralized framework where agents employ variational quantum circuits sharing entangled states as their policy, complemented by a hybrid quantum-classical critic mechanism. For the first time, they empirically demonstrate a clear quantum advantage in tasks with theoretically established classical bounds: in the CHSH game, their approach achieves a success rate of 0.854—approaching the Tsirelson bound and significantly surpassing the classical limit of 0.75; in the CoopNav task, the entanglement-free quantum variant attains a success rate of approximately 0.85, roughly double that of the classical MAA2C baseline (~0.40). These results highlight the critical role of specific Bell states in enhancing cooperative performance.

📝 Abstract

We present an empirical evaluation of quantum entanglement in agent coordination within quantum multi agent reinforcement learning (QMARL). While QMARL has attracted growing interest recently, most prior work evaluates quantum policies without provable baselines, making it impossible to rigorously distinguish quantum advantage from algorithmic coincidence. We address this directly by evaluating a decentralized QMARL framework with variational quantum circuit (VQC) actors with shared entangled states. In the CHSH game, which has a mathematically proven classical performance ceiling of 0.75 win rate, we show that entangled QMARL agents approach the Tsirelson limit of 0.854, providing clear evidence of their quantum advantage. We show that unentangled quantum circuits match the classical baseline, confirming that entanglement and not the quantum circuit itself is the active coordination mechanism. We also explore the effect of specific entanglement structures, as some Bell states enable coordination gains while others actively harm performance. On cooperative navigation (CoopNav), QMARL without entanglement achieves $\sim2\times$ improvement in success rate over classical MAA2C ($\sim$0.85 versus $\sim$0.40), with the hybrid configuration, quantum actor paired with a classical centralised critic, outperforming both fully classical and fully quantum solutions. We present our experimental analysis and discuss future work.

Problem

Research questions and friction points this paper is trying to address.

Quantum Advantage

Multi-Agent Reinforcement Learning

Entanglement

Classical Baseline

Coordination

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum entanglement

multi-agent reinforcement learning

variational quantum circuit