🤖 AI Summary
This study investigates the impact of different activation functions in Restricted Boltzmann Machines (RBMs) on their capacity to represent and learn complex high-order interaction structures, particularly in data exhibiting strong high-order correlations that are notoriously difficult to model. Leveraging the statistical physics duality between RBMs and interacting binary-variable models, the work provides the first analytical characterization of RBM representational power through the moment distribution of induced interactions. It systematically examines the statistical properties of interactions generated by four activation functions—linear, step, ReLU, and exponential—and demonstrates theoretically that rapidly growing nonlinearities, such as the exponential function, can substantially enhance learning performance for high-order structures within specific parameter regimes. The analysis also reveals a class of strong high-order interaction data inherently difficult for any RBM to learn. Theoretical predictions align closely with empirical training results, offering a novel analytical framework for understanding RBM representational capabilities.
📝 Abstract
The great success of neural networks in recognizing hidden patterns and correlations in complex data lies in the way they take advantage of the large number of parameters and nonlinear single-unit activation, jointly. Restricted Boltzmann Machines (RBMs) provide a simple yet powerful framework for studying the impact of activation nonlinearities on performance and representation. In this work, we exploit the duality between RBMs and models of interacting binary variables to study the statistics of the interactions induced by RBM ensembles with different hidden unit activation functions. We characterize the space of representable models analytically in terms of moments of the distribution of induced interactions for four commonly used activation functions: Linear, Step, ReLU, and Exponential. Quantitative predictions of the analytical calculations on learning show a very good agreement with results of the simulations of the training process. In particular, our analysis shows that there are certain data structures, namely those generated by models of interacting variables with large interaction terms beyond pairwise, that are difficult to represent, and thus to learn, for any RBM. Yet, we find that rapidly increasing nonlinearities, such as the Exponential function, can facilitate the representation and learning of such data structures for a specific range of parameters that is determined analytically.