๐ค AI Summary
Variational quantum algorithms (VQAs) suffer from ansatz design reliance on manual, heuristic engineering. Method: This work models structured variational quantum circuit design as a sequential decision-making problem and introduces Proximal Policy Optimization (PPO)โthe first application of reinforcement learning (RL) to this task. We propose a dual-path approach: RLVQC-Block, built upon the QAOA framework, enhances generalization; RLVQC-Global relaxes graph-structural constraints to minimize circuit depth. Both methods use experimentally measurable quantum states as state observations and incorporate graph-theoretic priors from the underlying QUBO problem. Contribution/Results: On diverse graph-based QUBO instances, RLVQC-Block consistently outperforms QAOA while maintaining comparable circuit depth; RLVQC-Global achieves significant depth reduction but with modest performance trade-offs. This work establishes a new paradigm for automated, interpretable, and task-oriented quantum circuit synthesis.
๐ Abstract
Variational Quantum Algorithms (VQAs) are among the most promising approaches for leveraging near-term quantum hardware, yet their effectiveness strongly depends on the design of the underlying circuit ansatz, which is typically constructed with heuristic methods. In this work, we represent the synthesis of variational quantum circuits as a sequential decision-making problem, where gates are added iteratively in order to optimize an objective function, and we introduce two reinforcement learning-based methods, RLVQC Global and RLVQC Block, tailored to combinatorial optimization problems. RLVQC Block creates ansatzes that generalize the Quantum Approximate Optimization Algorithm (QAOA), by discovering a two-qubits block that is applied to all the interacting qubit pairs. While RLVQC Global further generalizes the ansatz and adds gates unconstrained by the structure of the interacting qubits. Both methods adopt the Proximal Policy Optimization (PPO) algorithm and use empirical measurement outcomes as state observations to guide the agent. We evaluate the proposed methods on a broad set of QUBO instances derived from classical graph-based optimization problems. Our results show that both RLVQC methods exhibit strong results with RLVQC Block consistently outperforming QAOA and generally surpassing RLVQC Global. While RLVQC Block produces circuits with depth comparable to QAOA, the Global variant is instead able to find significantly shorter ones. These findings suggest that reinforcement learning methods can be an effective tool to discover new ansatz structures tailored for specific problems and that the most effective circuit design strategy lies between rigid predefined architectures and completely unconstrained ones, offering a favourable trade-off between structure and adaptability.