🤖 AI Summary
Deep probabilistic models often struggle to balance expressive power with efficient Bayesian inference due to the absence of closed-form solutions. This work proposes a constructive framework based on five factor graph primitives—bilinear factors, exponential links, Gamma priors, Gaussian likelihoods, and equality nodes—and demonstrates for the first time that non-conjugate factor graphs can admit closed-form mean-field variational inference in arbitrarily deep architectures through specific combinations of these primitives. Leveraging the closure properties of Gaussian and Gamma families, moment-generating functions, and sufficient statistics, the approach enables Bayesian deep networks with universal function approximation capabilities. Empirical validation across five time-series benchmarks confirms its effectiveness, yielding well-calibrated uncertainty estimates and expert-level Bayesian ensemble predictions.
📝 Abstract
Stacking probabilistic building blocks into deeper architectures typically breaks closed-form inference. We show that closed-form inference can be preserved. We identify five factor-graph primitives: a bilinear factor, an exponential link, a Gamma prior, a Gaussian likelihood, and an equality node, and prove that any model composed from them admits closed-form variational message passing. The construction works because each primitive preserves a small set of message families: under mean-field factorization, messages on Gaussian variables remain Gaussian and messages on precision variables remain Gamma, while the only non-conjugate interface, the exponential link, remains tractable through the Gaussian moment-generating function and the sufficient statistics of the Gamma family. We demonstrate composition at increasing depth, from static ensembles through input-dependent gating to split-branch routing, and show that stacking routing layers encodes arbitrary decision trees, establishing universal function approximation with closed-form inference. Applied to ensemble time-series forecasting, the framework yields a Bayesian mixture of experts in which gating functions are inferred rather than learned, providing calibrated uncertainty over expert selection across five benchmark datasets.