🤖 AI Summary
This work seeks a unified understanding of the intrinsic mechanisms underlying deep generative models, focusing on the energy landscape of Restricted Boltzmann Machines (RBMs) and its physical analogies to diffusion processes and coupled bosonic systems.
Method: We model the RBM initial state as a saddle point and integrate reciprocal-space analysis, random matrix theory, and mean-field approximation to characterize its curvature spectrum; we further propose a symmetry-breaking criterion based on singular values and eigenvector matrices, drawing parallels with Landau’s phase transition theory.
Contribution/Results: We rigorously prove that, in the thermodynamic limit, reciprocal variables asymptotically follow a Gaussian distribution, while certain modes fail to converge to the Boltzmann distribution. Empirical validation on MNIST confirms that hidden-layer size governs hierarchical feature extraction and symmetry evolution, with the curvature spectrum obeying the Marchenko–Pastur law.
📝 Abstract
Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.