Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study investigates whether modern feedforward neural networks—including MLPs, CNNs, GNNs, and fixed-length Transformers—admit finite sample complexity within the distribution-free agnostic PAC learning framework. By introducing o-minimal structures from model theory into deep learning for the first time, the authors model the definability of each network layer and prove that any such architecture is PAC learnable with finite samples, provided every layer is definable in an o-minimal structure—regardless of whether parameters are bounded. This result establishes a unified guarantee of universal learnability for mainstream acyclic architectures without relying on specific activation functions or ad hoc VC-dimension analyses, offering a novel theoretical perspective on the foundations of deep learning.

📝 Abstract

We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, including linear projections, residual connections, attention mechanisms, pooling layers, normalization layers, and admissible positional encodings. Hence, distribution-free learnability for modern non-recurrent architectures is not an exceptional property of particular activations or architecture-specific VC arguments, but a consequence of tame feedforward computation. Our results reposition finite-sample PAC learnability as a baseline rather than a differentiator: they shift the focus of architectural comparison toward inductive biases, symmetries and geometric priors, scalability, and optimization behaviour.

Problem

Research questions and friction points this paper is trying to address.

sample complexity

feedforward neural networks

PAC learnability

o-minimal structures

distribution-free learnability

Innovation

Methods, ideas, or system contributions that make the work stand out.

o-minimal structure

finite sample complexity

agnostic PAC learnability