🤖 AI Summary
This work addresses the lack of a solid theoretical foundation for dependency networks, whose model distribution is implicitly defined as the stationary distribution of pseudo-Gibbs sampling and lacks a closed-form expression. From the perspective of information geometry, the paper interprets each step of pseudo-Gibbs sampling as an m-projection onto the manifold of full conditionals, thereby reformulating structure and parameter learning as a decomposable optimization problem. The authors introduce the full-conditional divergence and a tight upper bound to characterize the location of the stationary distribution within the probability simplex, and prove that the learned model converges uniformly to the true distribution as the sample size tends to infinity. Both theoretical analysis and empirical experiments demonstrate that the proposed upper bound is effective in practice and that the model enjoys statistical consistency.
📝 Abstract
Dependency networks (Heckerman et al., 2000) provide a flexible framework for modeling complex systems with many variables by combining independently learned local conditional distributions through pseudo-Gibbs sampling. Despite their computational advantages over Bayesian and Markov networks, the theoretical foundations of dependency networks remain incomplete, primarily because their model distributions -- defined as stationary distributions of pseudo-Gibbs sampling -- lack closed-form expressions. This paper develops an information-geometric analysis of pseudo-Gibbs sampling, interpreting each sampling step as an m-projection onto a full conditional manifold. Building on this interpretation, we introduce the full conditional divergence and derive an upper bound that characterizes the location of the stationary distribution in the space of probability distributions. We then reformulate both structure and parameter learning as optimization problems that decompose into independent subproblems for each node, and prove that the learned model distribution converges to the true underlying distribution as the number of training samples grows to infinity. Experiments confirm that the proposed upper bound is tight in practice.