🤖 AI Summary
CNOT gate minimization is a fundamental open problem in quantum circuit optimization, whose computational complexity remains unresolved. This paper introduces a unified reinforcement learning framework that achieves cross-scale generalization for CNOT circuits of size $n = 3$–$15$, leveraging matrix embedding and Gaussian striping as preprocessing steps. The method performs policy learning over the space of functionally equivalent CNOT circuits—applying only Clifford-preserving transformations—to reduce gate count without altering logical functionality. Experimental results demonstrate consistent superiority over state-of-the-art algorithms across all tested scales; the improvement is particularly pronounced for larger instances ($n geq 10$), where reductions in gate count exceed those of prior approaches by significant margins. By enabling scalable, learning-based compilation without reliance on hand-crafted heuristics or exhaustive search, this work establishes a novel paradigm for practical quantum circuit optimization and compilation.
📝 Abstract
CNOT gates are fundamental to quantum computing, as they facilitate entanglement, a crucial resource for quantum algorithms. Certain classes of quantum circuits are constructed exclusively from CNOT gates. Given their widespread use, it is imperative to minimise the number of CNOT gates employed. This problem, known as CNOT minimisation, remains an open challenge, with its computational complexity yet to be fully characterised. In this work, we introduce a novel reinforcement learning approach to address this task. Instead of training multiple reinforcement learning agents for different circuit sizes, we use a single agent up to a fixed size $m$. Matrices of sizes different from m are preprocessed using either embedding or Gaussian striping. To assess the efficacy of our approach, we trained an agent with m = 8, and evaluated it on matrices of size n that range from 3 to 15. The results we obtained show that our method overperforms the state-of-the-art algorithm as the value of n increases.