🤖 AI Summary
Existing generalization bounds for heavy-tailed stochastic optimization either rely on intractable information-theoretic quantities or yield only expectation-based guarantees. To address this, this paper establishes the first *computable, dimension-friendly, high-probability generalization bound* for heavy-tailed stochastic differential equation (SDE) optimizers. Methodologically, we introduce a novel entropy flow analysis framework grounded in the fractional-order Fokker–Planck equation, unifying heavy-tailed SDE theory with fractional PDE techniques. Our analysis reveals a structural-phase transition phenomenon: the impact of heavy tails on generalization—beneficial or detrimental—is governed by the underlying problem geometry. The resulting bound is fully computable, contains no unmeasurable terms, and exhibits improved dimension dependence compared to prior work. Extensive experiments across multiple models and datasets empirically validate the theoretical insights.
📝 Abstract
Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed stochastic differential equations as proxies, prior works either provided expected generalization bounds, or introduced non-computable information theoretic terms. Addressing these drawbacks, in this work, we prove high-probability generalization bounds for heavy-tailed SDEs which do not contain any nontrivial information theoretic terms. To achieve this goal, we develop new proof techniques based on estimating the entropy flows associated with the so-called fractional Fokker-Planck equation (a partial differential equation that governs the evolution of the distribution of the corresponding heavy-tailed SDE). In addition to obtaining high-probability bounds, we show that our bounds have a better dependence on the dimension of parameters as compared to prior art. Our results further identify a phase transition phenomenon, which suggests that heavy tails can be either beneficial or harmful depending on the problem structure. We support our theory with experiments conducted in a variety of settings.