🤖 AI Summary
Variational quantum circuits (VQCs) in quantum reinforcement learning (QRL) suffer from limited expressive power due to fixed local measurements, hindering effective function approximation.
Method: We propose the Adaptive Non-local Observable (ANO) framework, which jointly optimizes VQC parameters and multi-qubit non-local measurement operators—without increasing circuit depth—to enhance model expressivity. ANO is the first approach to integrate adaptive multi-qubit measurements into QRL, enabling end-to-end differentiable training and seamless integration with classical RL algorithms (e.g., DQN, A3C) in a hybrid quantum-classical architecture.
Results: On multiple benchmark tasks, ANO significantly outperforms conventional VQC-based baselines. Ablation studies confirm that performance gains stem primarily from the adaptive measurement mechanism—not parameter optimization alone. This work establishes a new paradigm for overcoming measurement-induced bottlenecks and unlocking the full potential of QRL.
📝 Abstract
Hybrid quantum-classical frameworks leverage quantum computing for machine learning; however, variational quantum circuits (VQCs) are limited by the need for local measurements. We introduce an adaptive non-local observable (ANO) paradigm within VQCs for quantum reinforcement learning (QRL), jointly optimizing circuit parameters and multi-qubit measurements. The ANO-VQC architecture serves as the function approximator in Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms. On multiple benchmark tasks, ANO-VQC agents outperform baseline VQCs. Ablation studies reveal that adaptive measurements enhance the function space without increasing circuit depth. Our results demonstrate that adaptive multi-qubit observables can enable practical quantum advantages in reinforcement learning.