🤖 AI Summary
This study investigates how neural classifiers achieve strong generalization through implicit hypothesis testing during supervised training. By modeling the classification task as a binary hypothesis test between class-conditional distributions in the representation space, we employ an information-theoretic framework to analyze how the learned decision rule aligns with the Neyman–Pearson optimal criterion throughout training. Both theoretical analysis and empirical evidence demonstrate that well-generalizing networks consistently increase the Kullback–Leibler (KL) divergence between class-conditional distributions, thereby approaching the optimal detector and exponentially reducing classification error. Our findings uncover an intrinsic link between the dynamic evolution of KL divergence and generalization performance, offering a novel perspective for understanding neural network generalization and informing the design of new regularization strategies.
📝 Abstract
We study the supervised training dynamics of neural classifiers through the lens of binary hypothesis testing. We model classification as a set of binary tests between class-conditional distributions of representations and empirically show that, along training trajectories, well-generalizing networks increasingly align with Neyman-Pearson optimal decision rules via monotonic improvements in KL divergence that relate to error rate exponents. We finally discuss how this yields an explanation and possible training or regularization strategies for different classes of neural networks.