Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the unclear relationship between resource utilization efficiency and generalization capability of large language models under a fixed computational budget. Building upon the neural tangent hypothesis, it extends the notion of additivity from parameter space to gradient space, introducing the concept of “neural interaction.” The work systematically investigates how the depth-to-width ratio of models influences interaction efficiency and generalization. Through gradient-space analysis and evaluation on the MMLU-Pro benchmark, the authors identify an optimal interval of neural interaction where models achieve superior performance under identical computational constraints. Notably, this efficient interaction regime remains stable even as the computational budget scales. These findings highlight the critical role of model architecture—specifically its shape—in determining both resource efficiency and generalization.

📝 Abstract

The guidance of scaling laws has increased the resource demands of modern large language models (LLMs), yet it remains questionable whether these models utilize resources effectively under a fixed budget. Previous research has proved superposition as a key contributor to loss. By leveraging the Neural Feature Ansatz, we extend superposition from parameter space to gradient space and define it as neural interaction. We find that under a fixed budget, good generalization is usually accompanied by efficient neural interactions, and the model can be placed in an efficient interaction interval by adjusting its depth-width ratio ($R_{D/W}$). In addition, as the budget scales up, the efficient interaction interval of the model remains relatively stable. By comparing existing small scale dense LLMs, we observe that models operating near this interval tend to perform better on the MMLU-Pro benchmark. Our findings reveal that the $R_{D/W}$ influences resource utilization efficiency and thereby affects generalization, providing insights into model shape initialization and the understanding of model generalization mechanisms. Code for Neural Interaction Law is available at: https://anonymous.4open.science/r/Neural_Interaction_Law-D788

Problem

Research questions and friction points this paper is trying to address.

neural interaction

depth-width ratio

generalization

resource efficiency

scaling laws

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Interaction

Depth-Width Ratio

Superposition