Over-Alignment vs Over-Fitting: The Role of Feature Learning Strength in Generalization

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study challenges the prevailing assumption that stronger feature learning always improves generalization, investigating how feature learning strength (FLS) affects the generalization of deep networks under realistic training termination conditions. By analyzing the gradient flow dynamics of two-layer ReLU networks with logistic loss and modulating FLS through initialization scale, the authors conduct large-scale experiments in non-asymptotic training regimes. They reveal, for the first time, the existence of an optimal FLS: excessive FLS leads to “over-alignment,” while insufficient FLS results in “overfitting,” both degrading generalization. The work establishes a non-monotonic relationship between FLS and generalization, both theoretically and empirically, introducing a dual “over-alignment–overfitting” perspective that offers new insights into the mechanisms underlying generalization in deep learning.

Technology Category

Application Category

📝 Abstract

Feature learning strength (FLS), i.e., the inverse of the effective output scaling of a model, plays a critical role in shaping the optimization dynamics of neural nets. While its impact has been extensively studied under the asymptotic regimes -- both in training time and FLS -- existing theory offers limited insight into how FLS affects generalization in practical settings, such as when training is stopped upon reaching a target training risk. In this work, we investigate the impact of FLS on generalization in deep networks under such practical conditions. Through empirical studies, we first uncover the emergence of an $\textit{optimal FLS}$ -- neither too small nor too large -- that yields substantial generalization gains. This finding runs counter to the prevailing intuition that stronger feature learning universally improves generalization. To explain this phenomenon, we develop a theoretical analysis of gradient flow dynamics in two-layer ReLU nets trained with logistic loss, where FLS is controlled via initialization scale. Our main theoretical result establishes the existence of an optimal FLS arising from a trade-off between two competing effects: An excessively large FLS induces an $\textit{over-alignment}$ phenomenon that degrades generalization, while an overly small FLS leads to $\textit{over-fitting}$.

Problem

Research questions and friction points this paper is trying to address.

feature learning strength

generalization

over-alignment

over-fitting

neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Learning Strength

Over-Alignment

Over-Fitting