Generalization Through Growth: Hidden Dynamics Controls Depth Dependence

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Deep neural networks suffer from泛化 bounds that scale unfavorably with network depth. This work addresses the strong depth dependence in generalization bounds. Method: We propose the first unified generalization analysis framework applicable to arbitrary pseudo-metric spaces. Hidden-layer mappings are modeled as continuous semigroups, and their geometric dynamics—characterized by the word-ball growth function β(k) (exhibiting polynomial or exponential growth)—are quantitatively linked to generalization rates. Contributions/Results: (i) First attribution of generalization bounds to the dynamical properties of hidden-layer mappings; (ii) An architecture- and input-agnostic geometric criterion for generalization; (iii) Proof that expansive dynamics enable exponential parameter savings; (iv) An explicit generalization bound O(√(α + log β(k))/n), rigorously explaining sublinear and depth-independent rates via geometric principles; (v) Transferable theoretical guarantees for diffusion models and test-time inference.

Technology Category

Application Category

📝 Abstract

Recent theory has reduced the depth dependence of generalization bounds from exponential to polynomial and even depth-independent rates, yet these results remain tied to specific architectures and Euclidean inputs. We present a unified framework for arbitrary lue{pseudo-metric} spaces in which a depth-(k) network is the composition of continuous hidden maps (f:mathcal{X} o mathcal{X}) and an output map (h:mathcal{X} o mathbb{R}). The resulting bound $O(sqrt{(alpha + log eta(k))/n})$ isolates the sole depth contribution in (eta(k)), the word-ball growth of the semigroup generated by the hidden layers. By Gromov's theorem polynomial (resp. exponential) growth corresponds to virtually nilpotent (resp. expanding) dynamics, revealing a geometric dichotomy behind existing $O(sqrt{k})$ (sublinear depth) and $ ilde{O}(1)$ (depth-independent) rates. We further provide covering-number estimates showing that expanding dynamics yield an exponential parameter saving via compositional expressivity. Our results decouple specification from implementation, offering architecture-agnostic and dynamical-systems-aware guarantees applicable to modern deep-learning paradigms such as test-time inference and diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Unified framework for generalization bounds in arbitrary pseudo-metric spaces

Depth dependence isolated via word-ball growth of hidden layer dynamics

Architecture-agnostic guarantees for modern deep-learning paradigms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for pseudo-metric spaces

Depth contribution isolated via word-ball growth

Exponential parameter saving with expanding dynamics

🔎 Similar Papers

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions