🤖 AI Summary
This study investigates whether modern feedforward neural networks—including MLPs, CNNs, GNNs, and fixed-length Transformers—admit finite sample complexity within the distribution-free agnostic PAC learning framework. By introducing o-minimal structures from model theory into deep learning for the first time, the authors model the definability of each network layer and prove that any such architecture is PAC learnable with finite samples, provided every layer is definable in an o-minimal structure—regardless of whether parameters are bounded. This result establishes a unified guarantee of universal learnability for mainstream acyclic architectures without relying on specific activation functions or ad hoc VC-dimension analyses, offering a novel theoretical perspective on the foundations of deep learning.
📝 Abstract
We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, including linear projections, residual connections, attention mechanisms, pooling layers, normalization layers, and admissible positional encodings. Hence, distribution-free learnability for modern non-recurrent architectures is not an exceptional property of particular activations or architecture-specific VC arguments, but a consequence of tame feedforward computation. Our results reposition finite-sample PAC learnability as a baseline rather than a differentiator: they shift the focus of architectural comparison toward inductive biases, symmetries and geometric priors, scalability, and optimization behaviour.