🤖 AI Summary
Scalable training of quantum neural networks (QNNs) faces a fundamental trade-off between gradient estimation efficiency and model expressivity. This work establishes, for the first time, a rigorous quantitative characterization of this trade-off in deep QNNs: stronger expressivity necessitates more quantum measurements per parameter gradient estimate. To resolve this, we propose the Stabilizer Logical Product Ansatz (SLPA), which leverages circuit symmetries and stabilizer formalism to achieve theoretically optimal measurement efficiency. We provide a constructive design procedure for SLPA and a formal proof of its optimality. Numerical experiments demonstrate that SLPA maintains high accuracy and trainability while significantly reducing sample complexity—substantially outperforming conventional parameter-shift methods in quantum measurement overhead. Our approach thus introduces a new paradigm for efficient, scalable QNN training.
📝 Abstract
Quantum neural networks (QNNs) require an efficient training algorithm to achieve practical quantum advantages. A promising approach is gradient-based optimization, where gradients are estimated by quantum measurements. However, QNNs currently lack general quantum algorithms for efficiently measuring gradients, which limits their scalability. To elucidate the fundamental limits and potentials of efficient gradient estimation, we rigorously prove a trade-off between gradient measurement efficiency (the mean number of simultaneously measurable gradient components) and expressivity in deep QNNs. This trade-off indicates that more expressive QNNs require higher measurement costs per parameter for gradient estimation, while reducing QNN expressivity to suit a given task can increase gradient measurement efficiency. We further propose a general QNN ansatz called the stabilizer-logical product ansatz (SLPA), which achieves the trade-off upper bound by exploiting the symmetric structure of the quantum circuit. Numerical experiments show that the SLPA drastically reduces the sample complexity needed for training while maintaining accuracy and trainability compared to well-designed circuits based on the parameter-shift method.