🤖 AI Summary
This work addresses the challenges of high computational overhead in high-dimensional multi-agent reinforcement learning and the infeasibility of centralized training on current quantum hardware by proposing the first distributed quantum reinforcement learning framework. The framework enables agents to learn independently in local environments while achieving efficient coordination through compact state-action encoding and quantum random sampling. It is specifically designed for settings where agents have non-overlapping observation and action spaces and demonstrates strong scalability. Empirical evaluation on the Cooperative-Pong task shows that the proposed method improves performance by approximately 10% over existing distributed strategies and by about 5% compared to classical policy models, effectively mitigating quantum resource constraints and significantly enhancing learning efficiency.
📝 Abstract
Reinforcement learning (RL) is one of the most practical ways to learn from real-life use-cases. Motivated from the cognitive methods used by humans makes it a widely acceptable strategy in the field of artificial intelligence. Most of the environments used for RL are often high-dimensional, and traditional RL algorithms becomes computationally expensive and challenging to effectively learn from such systems. Recent advancements in practical demonstration of quantum computing (QC) theories, such as compact encoding, enhanced representation and learning algorithms, random sampling, or the inherent stochastic nature of quantum systems, have opened up new directions to tackle these challenges. Quantum reinforcement learning (QRL) is seeking significant traction over the past few years. However, the current state of quantum hardware is not enough to cater for such high-dimensional environments with complex multi-agent setup. To tackle this issue, we propose a distributed framework for QRL where multiple agents learn independently, distributing the load of joint training from individual machines. Our method works well for environments with disjoint sets of action and observation spaces, but can also be extended to other systems with reasonable approximations. We analyze the proposed method on cooperative-pong environment and our results indicate ~10% improvement from other distribution strategies, and ~5% improvement from classical models of policy representation.