🤖 AI Summary
This study investigates the mechanisms by which neural networks develop aligned representations across diverse architectures, training protocols, and datasets, with a particular focus on the roles of data signal-to-noise ratio (SNR) and sample size. By analyzing both synthetic and real-world data—augmented with controlled noise—in regression and classification tasks, and combining analytical derivations for single-hidden-layer linear networks with empirical experiments on deep nonlinear networks, the work reveals that representation alignment strength increases monotonically with SNR but exhibits a non-monotonic relationship with sample size, reaching its weakest point near the interpolation threshold. Furthermore, the study demonstrates that stronger alignment does not necessarily improve generalization, highlighting the nuanced and nontrivial influence of data quality and quantity on representational alignment.
📝 Abstract
Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.