🤖 AI Summary
Training binary neural networks (BiNNs) remains challenging due to discrete optimization and poor gradient flow. Method: This paper proposes a quantum-enhanced variational inference framework—the first to couple a quantum hypernetwork with Bayesian variational inference—where parameterized quantum circuits generate binary weights, leveraging quantum superposition and entanglement to expand the optimization search space. Theoretically, we establish the first explicit connection between quantum hypernetworks and the evidence lower bound (ELBO), deriving an analytically tractable lower bound. For implicit distributions, we design an MMD-based surrogate ELBO, circumventing the conventional maximum likelihood estimation (MLE) requirement for explicit likelihood models. Results: Experiments demonstrate significant improvements over standard MLE in both simulated and implicit-distribution settings, achieving breakthroughs in BiNN trainability and generalization performance.
📝 Abstract
Binary Neural Networks (BiNNs), which employ single-bit precision weights, have emerged as a promising solution to reduce memory usage and power consumption while maintaining competitive performance in large-scale systems. However, training BiNNs remains a significant challenge due to the limitations of conventional training algorithms. Quantum HyperNetworks offer a novel paradigm for enhancing the optimization of BiNN by leveraging quantum computing. Specifically, a Variational Quantum Algorithm is employed to generate binary weights through quantum circuit measurements, while key quantum phenomena such as superposition and entanglement facilitate the exploration of a broader solution space. In this work, we establish a connection between this approach and Bayesian inference by deriving the Evidence Lower Bound (ELBO), when direct access to the output distribution is available (i.e., in simulations), and introducing a surrogate ELBO based on the Maximum Mean Discrepancy (MMD) metric for scenarios involving implicit distributions, as commonly encountered in practice. Our experimental results demonstrate that the proposed methods outperform standard Maximum Likelihood Estimation (MLE), improving trainability and generalization.