Gradient-free variational learning with conditional mixture networks

📅 2024-08-29

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

In supervised learning, deep models often lack probabilistic calibration and uncertainty quantification, while Bayesian approaches—though theoretically principled—are computationally prohibitive; existing variational methods struggle to scale to complex architectures. This paper introduces the first gradient-free variational training framework, integrating coordinate-ascent variational inference (CAVI) with Polya-Gamma data augmentation to enable closed-form posterior updates for all parameters of Conditional Mixture Networks (CMNs)—a probabilistic mixture-of-experts architecture that explicitly models input-dependent uncertainty. The method supports deep structures, scales linearly with both input dimensionality and number of experts, and achieves inference speed comparable to maximum likelihood estimation (MLE), significantly outperforming black-box variational inference (BBVI). On UCI multiclass benchmarks, it matches or exceeds MLE + backpropagation in accuracy while providing full posterior distributions.

Technology Category

Application Category

📝 Abstract

Balancing computational efficiency with robust predictive performance is crucial in supervised learning, especially for critical applications. Standard deep learning models, while accurate and scalable, often lack probabilistic features like calibrated predictions and uncertainty quantification. Bayesian methods address these issues but can be computationally expensive as model and data complexity increase. Previous work shows that fast variational methods can reduce the compute requirements of Bayesian methods by eliminating the need for gradient computation or sampling, but are often limited to simple models. We introduce CAVI-CMN, a fast, gradient-free variational method for training conditional mixture networks (CMNs), a probabilistic variant of the mixture-of-experts (MoE) model. CMNs are composed of linear experts and a softmax gating network. By exploiting conditional conjugacy and P'olya-Gamma augmentation, we furnish Gaussian likelihoods for the weights of both the linear layers and the gating network. This enables efficient variational updates using coordinate ascent variational inference (CAVI), avoiding traditional gradient-based optimization. We validate this approach by training two-layer CMNs on standard classification benchmarks from the UCI repository. CAVI-CMN achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation, while maintaining competitive runtime and full posterior distributions over all model parameters. Moreover, as input size or the number of experts increases, computation time scales competitively with MLE and other gradient-based solutions like black-box variational inference (BBVI), making CAVI-CMN a promising tool for deep, fast, and gradient-free Bayesian networks.

Problem

Research questions and friction points this paper is trying to address.

Gradient-free variational learning

Balancing computational efficiency

Training conditional mixture networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-free variational method

Conditional mixture networks

Coordinate ascent variational inference

🔎 Similar Papers

A Primer on Variational Inference for Physics-Informed Deep Generative Modelling