An accurate flatness measure to estimate the generalization performance of CNN models

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing flatness measures struggle to accurately characterize the generalization ability of convolutional neural networks (CNNs), as they are often tailored to fully connected networks or overlook the geometric structure inherent to CNNs. This work addresses this limitation by focusing on a canonical CNN architecture comprising global average pooling followed by a linear classifier. For the first time, we derive a closed-form expression for the trace of the Hessian of the cross-entropy loss under this setting and propose a structure-aware relative flatness metric that explicitly accounts for the scale symmetries induced by convolution and pooling operations, as well as inter-filter interactions. Empirical evaluations demonstrate that the proposed metric effectively assesses and compares the generalization performance of CNN models, offering valuable theoretical guidance for architecture design and training strategies.

Technology Category

Application Category

πŸ“ Abstract
Flatness measures based on the spectrum or the trace of the Hessian of the loss are widely used as proxies for the generalization ability of deep networks. However, most existing definitions are either tailored to fully connected architectures, relying on stochastic estimators of the Hessian trace, or ignore the specific geometric structure of modern Convolutional Neural Networks (CNNs). In this work, we develop a flatness measure that is both exact and architecturally faithful for a broad and practically relevant class of CNNs. We first derive a closed-form expression for the trace of the Hessian of the cross-entropy loss with respect to convolutional kernels in networks that use global average pooling followed by a linear classifier. Building on this result, we then specialize the notion of relative flatness to convolutional layers and obtain a parameterization-aware flatness measure that properly accounts for the scaling symmetries and filter interactions induced by convolution and pooling. Finally, we empirically investigate the proposed measure on families of CNNs trained on standard image-classification benchmarks. The results obtained suggest that the proposed measure can serve as a robust tool to assess and compare the generalization performance of CNN models, and to guide the design of architecture and training choices in practice.
Problem

Research questions and friction points this paper is trying to address.

flatness measure
generalization performance
Convolutional Neural Networks
Hessian trace
architectural fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

flatness measure
Hessian trace
convolutional neural networks
generalization
global average pooling
πŸ”Ž Similar Papers
No similar papers found.
R
Rahman Taleghani
Department of Mathematics "Tullio Levi-Civita", University of Padova, Via Trieste 63, Padova, 35121, Veneto, Italy; Department of Computer Science, Ruhr University Bochum, University street 140, Bochum, 44801, North Rhine-Westphalia, Germany
M
Maryam Mohammadi
Department of Mathematical Sciences and Computer, Kharazmi University, Mofateh Avenue, Tehran, 15719-14911, Tehran, Iran
Francesco Marchetti
Francesco Marchetti
Fixed-term researcher, UniversitΓ  degli Studi di Padova