🤖 AI Summary
This work investigates the origins of adversarial examples—i.e., misclassified hard instances—and their impact on generalization in deep neural networks (DNNs). Methodologically, it employs interactive representation complexity analysis, adversarial example attribution, and cross-model parameter comparison experiments. Results reveal that randomness in low-level weights predominantly governs adversarial example composition; moreover, generalization-relevant feature interactions are primarily determined by low-level parameters, whereas high-level parameters and network architecture exert negligible influence. This study is the first to establish the decisive role of low-level parameters in shaping adversarial examples, thereby extending the lottery ticket hypothesis. It further uncovers a fundamental source of representational divergence across DNNs: even when test accuracy is comparable, models with distinct low-level parameters exhibit nearly disjoint sets of adversarial examples. These findings provide novel theoretical foundations and practical pathways for interpretable modeling and robustness optimization.
📝 Abstract
In this paper, we find that the complexity of interactions encoded by a deep neural network (DNN) can explain its generalization power. We also discover that the confusing samples of a DNN, which are represented by non-generalizable interactions, are determined by its low-layer parameters. In comparison, other factors, such as high-layer parameters and network architecture, have much less impact on the composition of confusing samples. Two DNNs with different low-layer parameters usually have fully different sets of confusing samples, even though they have similar performance. This finding extends the understanding of the lottery ticket hypothesis, and well explains distinctive representation power of different DNNs.