๐ค AI Summary
This work investigates how equivariance, locality, and weight sharing affect the sample complexity of single-layer neural networks for position-aware learning in image modeling. Building on statistical learning theory, we derive dimension-free upper and lower bounds on generalization errorโseparately quantifying the impact of each property. First, we prove that non-equivariant weight sharing achieves generalization performance comparable to equivariant architectures. Second, we precisely quantify the generalization benefit conferred by locality and reveal its fundamental trade-off with expressive capacity. Third, we extend the analysis to architectures incorporating max-pooling and multiple layers, yielding activation-function-agnostic, tight bounds. Our theoretical results are rigorously established via Rademacher complexity, filter norm constraints, and localized modeling. Empirical validation confirms that locality significantly improves generalization, albeit subject to inherent expressivity limitations.
๐ Abstract
Weight sharing, equivariance, and local filters, as in convolutional neural networks, are believed to contribute to the sample efficiency of neural networks. However, it is not clear how each one of these design choices contributes to the generalization error. Through the lens of statistical learning theory, we aim to provide insight into this question by characterizing the relative impact of each choice on the sample complexity. We obtain lower and upper sample complexity bounds for a class of single hidden layer networks. For a large class of activation functions, the bounds depend merely on the norm of filters and are dimension-independent. We also provide bounds for max-pooling and an extension to multi-layer networks, both with mild dimension dependence. We provide a few takeaways from the theoretical results. It can be shown that depending on the weight-sharing mechanism, the non-equivariant weight-sharing can yield a similar generalization bound as the equivariant one. We show that locality has generalization benefits, however the uncertainty principle implies a trade-off between locality and expressivity. We conduct extensive experiments and highlight some consistent trends for these models.