From Classical Data to Quantum Advantage -- Quantum Policy Evaluation on Quantum Hardware

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the classical data-driven quantum policy evaluation (QPE) problem to advance the exploration of quantum advantages in reinforcement learning. We propose an end-to-end framework that first learns a differentiable quantum environment model—parameterized via quantum circuits—from classical observational data, and then directly performs QPE on noisy intermediate-scale quantum (NISQ) hardware. Crucially, this is the first integration of classical data-driven quantum machine learning into the QPE pipeline, enabling joint optimization of quantum environment modeling, quantum state preparation, and policy evaluation. Experiments on real quantum devices validate the feasibility of our approach, substantially reducing reliance on prior knowledge of the quantum system. Moreover, the method provides a scalable pathway toward quantum-enhanced reinforcement learning on near-term hardware.

Technology Category

Application Category

📝 Abstract

Quantum policy evaluation (QPE) is a reinforcement learning (RL) algorithm which is quadratically more efficient than an analogous classical Monte Carlo estimation. It makes use of a direct quantum mechanical realization of a finite Markov decision process, in which the agent and the environment are modeled by unitary operators and exchange states, actions, and rewards in superposition. Previously, the quantum environment has been implemented and parametrized manually for an illustrative benchmark using a quantum simulator. In this paper, we demonstrate how these environment parameters can be learned from a batch of classical observational data through quantum machine learning (QML) on quantum hardware. The learned quantum environment is then applied in QPE to also compute policy evaluations on quantum hardware. Our experiments reveal that, despite challenges such as noise and short coherence times, the integration of QML and QPE shows promising potential for achieving quantum advantage in RL.

Problem

Research questions and friction points this paper is trying to address.

Quantum policy evaluation for reinforcement learning efficiency

Learning quantum environment parameters from classical data

Achieving quantum advantage on noisy quantum hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum policy evaluation for reinforcement learning

Quantum environment learning from classical data

Integration of QML and QPE on quantum hardware

🔎 Similar Papers

A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead