When Flatness Does (Not) Guarantee Adversarial Robustness

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether flat minima guarantee adversarial robustness in neural networks. Through geometric analysis of the loss manifold, we theoretically establish that flatness ensures only *local* robustness—not global robustness—and rigorously prove this limitation. We derive, for the first time, a closed-form expression for the *relative flatness* of the penultimate-layer weights. Furthermore, we show that maintaining global robustness necessitates *steep curvature* in regions outside the data manifold. Our theory reveals that adversarial examples frequently arise in high-confidence, broadly flat yet misclassified regions—uncovering an intrinsic link between flatness and overconfident misprediction. Extensive experiments across diverse architectures (ResNet, ViT) and datasets (CIFAR-10/100, ImageNet subsets) validate our theoretical findings, providing a unified geometric explanation for the persistent adversarial vulnerability of flat minima.

Technology Category

Application Category

📝 Abstract
Despite their empirical success, neural networks remain vulnerable to small, adversarial perturbations. A longstanding hypothesis suggests that flat minima, regions of low curvature in the loss landscape, offer increased robustness. While intuitive, this connection has remained largely informal and incomplete. By rigorously formalizing the relationship, we show this intuition is only partially correct: flatness implies local but not global adversarial robustness. To arrive at this result, we first derive a closed-form expression for relative flatness in the penultimate layer, and then show we can use this to constrain the variation of the loss in input space. This allows us to formally analyze the adversarial robustness of the entire network. We then show that to maintain robustness beyond a local neighborhood, the loss needs to curve sharply away from the data manifold. We validate our theoretical predictions empirically across architectures and datasets, uncovering the geometric structure that governs adversarial vulnerability, and linking flatness to model confidence: adversarial examples often lie in large, flat regions where the model is confidently wrong. Our results challenge simplified views of flatness and provide a nuanced understanding of its role in robustness.
Problem

Research questions and friction points this paper is trying to address.

Analyzing flatness and adversarial robustness relationship in neural networks
Formalizing local versus global robustness implications of flat minima
Identifying geometric structures causing confident misclassifications in flat regions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-form expression for relative flatness
Constraining loss variation in input space
Analyzing robustness via geometric structure
🔎 Similar Papers
No similar papers found.