Quantum-Enhanced Neural Contextual Bandit Algorithms

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenges of over-parameterization, computational instability, and barren plateaus in training quantum neural networks (QNNs) within the contextual bandit setting by proposing the QNTK-UCB algorithm. The method freezes a randomly initialized QNN and leverages its static Quantum Neural Tangent Kernel (QNTK) to perform ridge regression, thereby circumventing explicit training while incorporating quantum inductive bias to enhance online learning performance. As the first approach to integrate QNTK into the contextual bandit framework, it achieves a significantly reduced theoretical parameter complexity of Ω((TK)³), improving upon the classical NeuralUCB’s Ω((TK)⁸). The study further reveals the implicit regularization and spectral decay properties of QNTK, demonstrating superior sample efficiency in both synthetic nonlinear tasks and variational quantum eigensolvers.

Technology Category

Application Category

📝 Abstract

Stochastic contextual bandits are fundamental for sequential decision-making but pose significant challenges for existing neural network-based algorithms, particularly when scaling to quantum neural networks (QNNs) due to issues such as massive over-parameterization, computational instability, and the barren plateau phenomenon. This paper introduces the Quantum Neural Tangent Kernel-Upper Confidence Bound (QNTK-UCB) algorithm, a novel algorithm that leverages the Quantum Neural Tangent Kernel (QNTK) to address these limitations. By freezing the QNN at a random initialization and utilizing its static QNTK as a kernel for ridge regression, QNTK-UCB bypasses the unstable training dynamics inherent in explicit parameterized quantum circuit training while fully exploiting the unique quantum inductive bias. For a time horizon $T$ and $K$ actions, our theoretical analysis reveals a significantly improved parameter scaling of $\Omega((TK)^3)$ for QNTK-UCB, a substantial reduction compared to $\Omega((TK)^8)$ required by classical NeuralUCB algorithms for similar regret guarantees. Empirical evaluations on non-linear synthetic benchmarks and quantum-native variational quantum eigensolver tasks demonstrate QNTK-UCB's superior sample efficiency in low-data regimes. This work highlights how the inherent properties of QNTK provide implicit regularization and a sharper spectral decay, paving the way for achieving ``quantum advantage''in online learning.

Problem

Research questions and friction points this paper is trying to address.

quantum neural networks

contextual bandits

barren plateau

over-parameterization

computational instability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum Neural Tangent Kernel

Contextual Bandits

Quantum Advantage