Towards the Three-Phase Dynamics of Generalization Power of a DNN

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the mechanistic underpinnings of dynamic generalization evolution during deep neural network (DNN) training, aiming to disentangle and characterize the interplay between generalizable and non-generalizable features. We propose an interpretable analytical framework based on AND-OR interaction rewriting, which formally characterizes generalization evolution as a three-stage dynamical process: (i) early pruning of noisy interactions, (ii) mid-stage progressive acquisition of simple generalizable interactions, and (iii) late-stage forced learning of complex non-generalizable interactions. Experiments across multiple benchmark datasets validate the universality of this pattern. Quantitatively, we demonstrate a strong positive correlation between the proportion of non-generalizable interactions and the generalization error gap—establishing them as a direct cause of generalization failure. Our findings provide a causally interpretable intervention pathway for generalization control.

Technology Category

Application Category

📝 Abstract
This paper proposes a new perspective for analyzing the generalization power of deep neural networks (DNNs), i.e., directly disentangling and analyzing the dynamics of generalizable and non-generalizable interaction encoded by a DNN through the training process. Specifically, this work builds upon the recent theoretical achievement in explainble AI, which proves that the detailed inference logic of DNNs can be can be strictly rewritten as a small number of AND-OR interaction patterns. Based on this, we propose an efficient method to quantify the generalization power of each interaction, and we discover a distinct three-phase dynamics of the generalization power of interactions during training. In particular, the early phase of training typically removes noisy and non-generalizable interactions and learns simple and generalizable ones. The second and the third phases tend to capture increasingly complex interactions that are harder to generalize. Experimental results verify that the learning of non-generalizable interactions is the the direct cause for the gap between the training and testing losses.
Problem

Research questions and friction points this paper is trying to address.

Analyzing DNN generalization power dynamics during training
Quantifying generalization power of interaction patterns
Identifying three-phase dynamics in interaction learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangling generalizable and non-generalizable DNN interactions
Quantifying generalization power of each interaction
Discovering three-phase dynamics in training
🔎 Similar Papers
No similar papers found.
Y
Yuxuan He
University of Electronic Science and Technology of China
Junpeng Zhang
Junpeng Zhang
Hebei Normal University
Information SecurityPrivacy-PreservingDifferential Privacy
H
Hongyuan Zhang
Institute of Artificial Intelligence, China Telecom
Quanshi Zhang
Quanshi Zhang
Shanghai Jiao Tong University
Interpretable Machine Learning