π€ AI Summary
This work addresses the challenges of barren plateaus and local minima in variational quantum algorithms, as well as the poor scalability of existing Clifford-based initialization methods for large-scale problems. It introduces a novel framework that formulates the discrete selection of Clifford gate prefixes as a sequential decision-making problem, integrating stabilizer simulation with reinforcement learning. A Transformer-based policy network guides a Monte Carlo tree search to efficiently generate high-quality initial states through self-play and curriculum learningβall without altering the original circuit architecture. Evaluated on QAOA instances with up to 22 qubits and 1,370 parameters, the method achieves an average energy accuracy improvement of 3.17Γ (up to 45.02Γ) and a best-case accuracy gain of 2.44Γ (up to 16.01Γ). Its generalization capability is further demonstrated on VQE tasks.
π Abstract
Variational Quantum Algorithms (VQAs) potentially offer a pathway to practical quantum advantage, but their optimization is heavily hindered by barren plateaus and numerous local minima. While classically simulable Clifford circuits can warm-start VQAs to accelerate convergence, existing heuristic-based initialization methods struggle to scale within vast combinatorial search spaces. To overcome this bottleneck, we propose CRiSP (a Clifford Reinforcement Learning agent for State Preparation), a framework that formulates discrete prefix selection as a sequential decision-making problem. CRiSP utilizes Neural-Guided Monte Carlo Tree Search, driven by a Transformer-based policy trained via self-play, to insert learned Clifford gates before fixed parameterized rotations. This enables the construction of high-quality initial states entirely through polynomial-time classical stabilizer simulation without altering the underlying circuit architecture. By integrating a curriculum learning strategy that progressively expands the search horizon, the agent efficiently scales to deep circuits. Evaluated on QAOA benchmarks of up to $22$ qubits and $1{,}370$ parameters, CRiSP outperforms state-of-the-art Clifford initialization methods by a mean of $3.17\times$ (max $45.02\times$) in average energy accuracy and $2.44\times$ (max $16.01\times$) in best-achieved energy accuracy. Assessments on VQE tasks further demonstrate the framework's robustness and generalizability.