đ¤ AI Summary
Energy-based models suffer from intractable partition functions, hindering maximum-likelihood estimation. Method: This paper investigates pseudo-likelihood training in the zero-temperature limit to realize associative memory. Theoretically, we prove that pseudo-likelihood maximization inherently yields strong basins of attractionâenabling associative memory without requiring symmetric synaptic couplings, thus generalizing beyond classical Hopfield networks. Experiments across diverse domainsâincluding random feature models, MNIST, spin glasses, and protein sequencesâdemonstrate that stable attractors emerge from few training samples and generalize continuously as sample size increases. Moreover, retrieved attractors exhibit nontrivial statistical correlations with test samples, significantly outperforming classical Hopfield update rules. Contribution: This work establishes the first unified theoretical framework reconciling inference, memory storage, and generalization within energy-based models, providing both novel theoretical insights and a practical paradigm for memory-augmented learning systems.
đ Abstract
Energy-based probabilistic models learned by maximizing the likelihood of the data are limited by the intractability of the partition function. A widely used workaround is to maximize the pseudo-likelihood, which replaces the global normalization with tractable local normalizations. Here we show that, in the zero-temperature limit, a network trained to maximize pseudo-likelihood naturally implements an associative memory: if the training set is small, patterns become fixed-point attractors whose basins of attraction exceed those of any classical Hopfield rule. We explain quantitatively this effect on uncorrelated random patterns. Moreover, we show that, for different structured datasets coming from computer science (random feature model, MNIST), physics (spin glasses) and biology (proteins), as the number of training examples increases the learned network goes beyond memorization, developing meaningful attractors with non-trivial correlations with test examples, thus showing the ability to generalize. Our results therefore reveal pseudo-likelihood works both as an efficient inference tool and as a principled mechanism for memory and generalization.