🤖 AI Summary
This work addresses the lack of theoretical foundations for adaptive optimizers (e.g., Adam) in deep ReLU networks—non-smooth and non-convex settings—by establishing their first global convergence guarantees and generalization bounds. Methodologically, we introduce a hierarchical region-crossing compression strategy coupled with a Kakeya-type geometric analysis framework, integrating hierarchical Morse theory and directional complexity modeling to reduce the number of region crossings induced by ReLU activations from exponential to near-linear. Theoretical contributions include: (1) a generalization bound of $ ilde{O}(sqrt{d_{ ext{eff}}/n})$ without the Polyak–Łojasiewicz (PL) condition or convexity assumptions, improving upon existing PAC-Bayes results; and (2) globally optimal convergence under a weak low-barrier assumption. These advances resolve a fundamental bottleneck in optimization theory for non-smooth deep learning.
📝 Abstract
First-order adaptive optimization methods like Adam are the default choices for training modern deep neural networks. Despite their empirical success, the theoretical understanding of these methods in non-smooth settings, particularly in Deep ReLU networks, remains limited. ReLU activations create exponentially many region boundaries where standard smoothness assumptions break down. extbf{We derive the first ( ilde{O}!igl(sqrt{d_{mathrm{eff}}/n}igr)) generalization bound for Adam in Deep ReLU networks and the first global-optimal convergence for Adam in the non smooth, non convex relu landscape without a global PL or convexity assumption.} Our analysis is based on stratified Morse theory and novel results in Kakeya sets. We develop a multi-layer refinement framework that progressively tightens bounds on region crossings. We prove that the number of region crossings collapses from exponential to near-linear in the effective dimension. Using a Kakeya based method, we give a tighter generalization bound than PAC-Bayes approaches and showcase convergence using a mild uniform low barrier assumption.