🤖 AI Summary
This work investigates the mechanistic underpinnings of dynamic generalization evolution during deep neural network (DNN) training, aiming to disentangle and characterize the interplay between generalizable and non-generalizable features. We propose an interpretable analytical framework based on AND-OR interaction rewriting, which formally characterizes generalization evolution as a three-stage dynamical process: (i) early pruning of noisy interactions, (ii) mid-stage progressive acquisition of simple generalizable interactions, and (iii) late-stage forced learning of complex non-generalizable interactions. Experiments across multiple benchmark datasets validate the universality of this pattern. Quantitatively, we demonstrate a strong positive correlation between the proportion of non-generalizable interactions and the generalization error gap—establishing them as a direct cause of generalization failure. Our findings provide a causally interpretable intervention pathway for generalization control.
📝 Abstract
This paper proposes a new perspective for analyzing the generalization power of deep neural networks (DNNs), i.e., directly disentangling and analyzing the dynamics of generalizable and non-generalizable interaction encoded by a DNN through the training process. Specifically, this work builds upon the recent theoretical achievement in explainble AI, which proves that the detailed inference logic of DNNs can be can be strictly rewritten as a small number of AND-OR interaction patterns. Based on this, we propose an efficient method to quantify the generalization power of each interaction, and we discover a distinct three-phase dynamics of the generalization power of interactions during training. In particular, the early phase of training typically removes noisy and non-generalizable interactions and learns simple and generalizable ones. The second and the third phases tend to capture increasingly complex interactions that are harder to generalize. Experimental results verify that the learning of non-generalizable interactions is the the direct cause for the gap between the training and testing losses.