🤖 AI Summary
This work investigates how the architectural shape of variational quantum circuits—specifically the allocation between qubit count and encoding layers under a fixed encoding budget—profoundly influences trainability. The root cause is identified as structural rank deficiency in the coefficient-matching Jacobian matrix, leading to the proposed concept of “structural gradient starvation”: in serial architectures, the Jacobian nullspace expands unboundedly with increasing parameters, whereas parallel architectures avoid this issue. Through a Fourier-analytic perspective on expressivity, combined with theoretical analysis of the Jacobian, the spectrum of the quantum Fisher information matrix, and kernel space dimensionality, the study proves that parallel architectures maintain a strictly positive minimum singular value of the Jacobian when the number of parameters does not exceed \(2E + 1\). Moreover, adding feature map layers reduces the required parameter count by 1.6–2.2× to achieve \(R^2 \geq 0.95\), substantially enhancing parameter efficiency.
📝 Abstract
Variational quantum circuits with angle encoding implement truncated Fourier series, and architectures arranging $N$ qubits with $L$ encoding layers each -- sharing encoding budget $E = NL$ -- generate identical frequency spectra, identical frequency redundancy, and require the same minimum parameter count for coefficient control. Despite this equivalence, trainability varies substantially with architecture shape $(N,L)$ at fixed $E$. We identify structural rank deficiency of the coefficient matching Jacobian $J$ as the mechanism responsible. For serial single-qubit architectures, we prove $\mathrm{rank}(J) \leq 2L+1$ regardless of parameter count $P$, with $\dim(\ker J) \geq P-(2L+1)$ growing without bound -- a phenomenon we term \emph{structural gradient starvation}: a growing fraction of parameters become structurally decoupled from the loss as $P$ increases at fixed $L$. Parallel architectures avoid this via independent phase trajectories, ensuring $σ_{\min}(J^{(\mathrm{par})}) > 0$ generically for $P \leq 2E+1$, so no parameter lies in $\ker J$. For practitioners, we further show that the two natural routes to increasing parameter count have fundamentally different effects: adding feature map (FM) layers monotonically strengthens the Jacobian QFIM eigenvalue spectrum and achieves $R^2 \geq 0.95$ with $1.6$--$2.2\times$ fewer parameters than adding trainable blocks across all tested architectures, while trainable blocks improve training only through the classical interpolation mechanism with no quantum-specific benefit.