Replay-buffer engineering for noise-robust quantum circuit optimization

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses key limitations in deep reinforcement learning for quantum circuit optimization—namely, the disregard of temporal-difference (TD) target reliability in replay buffers, the high cost of full quantum-classical evaluations at every curriculum learning step, and the discarding of noise-free trajectories during retraining under hardware noise. Centered on the replay buffer, the authors propose ReaPER+, a dynamic prioritized replay strategy that integrates TD error with reliability-aware sampling; OptCRLQAS, a multi-edit amortized evaluation framework; and a lightweight, weight-free buffer transfer mechanism. Evaluated on quantum compilation and quantum architecture search tasks, the approach achieves 4–32× higher sample efficiency, reduces per-iteration runtime by 67.5% on 12-qubit problems, decreases the number of steps required to reach chemical accuracy in molecular ground-state energy estimation by 85–90%, and lowers energy errors by up to 90%.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning (RL) for quantum circuit optimization faces three fundamental bottlenecks: replay buffers that ignore the reliability of temporal-difference (TD) targets, curriculum-based architecture search that triggers a full quantum-classical evaluation at every environment step, and the routine discard of noiseless trajectories when retraining under hardware noise. We address all three by treating the replay buffer as a primary algorithmic lever for quantum optimization. We introduce ReaPER$+$, an annealed replay rule that transitions from TD error-driven prioritization early in training to reliability-aware sampling as value estimates mature, achieving $4-32\times$ gains in sample efficiency over fixed PER, ReaPER, and uniform replay while consistently discovering more compact circuits across quantum compilation and QAS benchmarks; validation on LunarLander-v3 confirms the principle is domain-agnostic. Furthermore we eliminate the quantum-classical evaluation bottleneck in curriculum RL by introducing OptCRLQAS which amortizes expensive evaluations over multiple architectural edits, cutting wall-clock time per episode by up to $67.5\%$ on a 12-qubit optimization problem without degrading solution quality. Finally we introduce a lightweight replay-buffer transfer scheme that warm-starts noisy-setting learning by reusing noiseless trajectories, without network-weight transfer or $ε$-greedy pretraining. This reduces steps to chemical accuracy by up to $85-90\%$ and final energy error by up to $90\%$ over from-scratch baselines on 6-, 8-, and 12-qubit molecular tasks. Together, these results establish that experience storage, sampling, and transfer are decisive levers for scalable, noise-robust quantum circuit optimization.
Problem

Research questions and friction points this paper is trying to address.

quantum circuit optimization
replay buffer
noise robustness
deep reinforcement learning
curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

replay buffer engineering
noise-robust quantum optimization
sample-efficient reinforcement learning
curriculum architecture search
experience transfer