High-entropy Advantage in Neural Networks' Generalizability

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

278K/year

🤖 AI Summary

Conventional machine learning often neglects physical principles, leading to suboptimal generalization and limited theoretical grounding for optimization. Method: We model neural networks as one-dimensional non-interacting particle systems and introduce statistical mechanical entropy to characterize model states. Using the Wang–Landau algorithm, we construct entropy–generalization landscapes for million-parameter networks. Contribution/Results: We discover a pronounced “entropy advantage”: high-entropy solutions consistently outperform low-entropy minima found by SGD and other standard optimizers—by up to 2.3× in narrow networks—challenging the universality assumption of SGD. Across arithmetic reasoning, tabular data, image classification, and language modeling, high-entropy states yield average test accuracy gains of 3.2–7.8%. This work establishes a new physics-informed paradigm for optimizer design, grounded in statistical mechanics and providing both theoretical justification and empirical validation for entropy-driven optimization.

Technology Category

Application Category

📝 Abstract

While the 2024 Nobel Prize in Physics ignites a worldwide discussion on the origins of neural networks and their foundational links to physics, modern machine learning research predominantly focuses on computational and algorithmic advancements, overlooking a picture of physics. Here we introduce the concept of entropy into neural networks by reconceptualizing them as hypothetical physical systems where each parameter is a non-interacting 'particle' within a one-dimensional space. By employing a Wang-Landau algorithms, we construct the neural networks' (with up to 1 million parameters) entropy landscapes as functions of training loss and test accuracy (or loss) across four distinct machine learning tasks, including arithmetic question, real-world tabular data, image recognition, and language modeling. Our results reveal the existence of extit{entropy advantage}, where the high-entropy states generally outperform the states reached via classical training optimizer like stochastic gradient descent. We also find this advantage is more pronounced in narrower networks, indicating a need of different training optimizers tailored to different sizes of neural networks.

Problem

Research questions and friction points this paper is trying to address.

Introducing entropy concept into neural networks

Exploring high-entropy states' performance in neural networks

Investigating entropy advantage in different network sizes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces entropy concept in neural networks

Uses Wang-Landau algorithm for entropy landscapes

High-entropy states outperform classical training optimizers

🔎 Similar Papers

No similar papers found.