🤖 AI Summary
This work investigates how macroscopic ordered structures spontaneously emerge from microscopic randomness in stochastic neural network ensembles, focusing on the existence and universality of an optimal temperature parameter under the Gibbs measure induced by classification loss.
Method: Leveraging statistical physics modeling, analytical mean-field theory, numerical simulations, and empirical validation on MNIST, the study rigorously analyzes ensemble behavior across varying temperatures.
Contribution/Results: We prove the existence of a unique finite optimal temperature that maximizes ensemble classification accuracy. Crucially, this temperature is independent of the unknown teacher model architecture and the number of base classifiers, exhibiting strong universality and robustness. This work establishes, for the first time, a rigorous correspondence between stochastic ensembles and statistical-physics self-organization phenomena—such as phase transitions and criticality—revealing that Gibbs-weighted aggregation fundamentally drives an “unordered-to-ordered” transition. The findings provide a novel theoretical framework for understanding generalization in deep learning and collective intelligence.
📝 Abstract
Randomness is ubiquitous in many applications across data science and machine learning. Remarkably, systems composed of random components often display emergent global behaviors that appear deterministic, manifesting a transition from microscopic disorder to macroscopic organization. In this work, we introduce a theoretical model for studying the emergence of collective behaviors in ensembles of random classifiers. We argue that, if the ensemble is weighted through the Gibbs measure defined by adopting the classification loss as an energy, then there exists a finite temperature parameter for the distribution such that the classification is optimal, with respect to the loss (or the energy). Interestingly, for the case in which samples are generated by a Gaussian distribution and labels are constructed by employing a teacher perceptron, we analytically prove and numerically confirm that such optimal temperature does not depend neither on the teacher classifier (which is, by construction of the learning problem, unknown), nor on the number of random classifiers, highlighting the universal nature of the observed behavior. Experiments on the MNIST dataset underline the relevance of this phenomenon in high-quality, noiseless, datasets. Finally, a physical analogy allows us to shed light on the self-organizing nature of the studied phenomenon.