🤖 AI Summary
Deep neural networks suffer from泛化 bounds that scale unfavorably with network depth. This work addresses the strong depth dependence in generalization bounds.
Method: We propose the first unified generalization analysis framework applicable to arbitrary pseudo-metric spaces. Hidden-layer mappings are modeled as continuous semigroups, and their geometric dynamics—characterized by the word-ball growth function β(k) (exhibiting polynomial or exponential growth)—are quantitatively linked to generalization rates.
Contributions/Results: (i) First attribution of generalization bounds to the dynamical properties of hidden-layer mappings; (ii) An architecture- and input-agnostic geometric criterion for generalization; (iii) Proof that expansive dynamics enable exponential parameter savings; (iv) An explicit generalization bound O(√(α + log β(k))/n), rigorously explaining sublinear and depth-independent rates via geometric principles; (v) Transferable theoretical guarantees for diffusion models and test-time inference.
📝 Abstract
Recent theory has reduced the depth dependence of generalization bounds from exponential to polynomial and even depth-independent rates, yet these results remain tied to specific architectures and Euclidean inputs. We present a unified framework for arbitrary lue{pseudo-metric} spaces in which a depth-(k) network is the composition of continuous hidden maps (f:mathcal{X} o mathcal{X}) and an output map (h:mathcal{X} o mathbb{R}). The resulting bound $O(sqrt{(alpha + log eta(k))/n})$ isolates the sole depth contribution in (eta(k)), the word-ball growth of the semigroup generated by the hidden layers. By Gromov's theorem polynomial (resp. exponential) growth corresponds to virtually nilpotent (resp. expanding) dynamics, revealing a geometric dichotomy behind existing $O(sqrt{k})$ (sublinear depth) and $ ilde{O}(1)$ (depth-independent) rates. We further provide covering-number estimates showing that expanding dynamics yield an exponential parameter saving via compositional expressivity. Our results decouple specification from implementation, offering architecture-agnostic and dynamical-systems-aware guarantees applicable to modern deep-learning paradigms such as test-time inference and diffusion models.