🤖 AI Summary
Proximal stochastic gradient descent (PSGD) struggles to identify underlying low-dimensional structures—such as support sets or low-rank manifolds—in stochastic composite optimization and lacks finite-time manifold identification guarantees.
Method: We propose the Normalized Stochastic Gradient Descent (NSGD), a proximal stochastic method built upon Robinson’s normal map, designed for general nonconvex stochastic settings.
Contributions/Results: NSGD is the first method to achieve finite-time active manifold identification and almost-sure convergence to stationary points without convexity assumptions or variance-reduction techniques. By integrating Kurdyka–Łojasiewicz inequality analysis with almost-sure iterative convergence theory, NSGD ensures global convergence to stable points with iteration complexity matching that of PSGD. Crucially, it identifies the active manifold exactly in finitely many steps with probability one—overcoming a fundamental structural identification limitation of conventional PSGD.
📝 Abstract
The proximal stochastic gradient method (PSGD) is one of the state-of-the-art approaches for stochastic composite-type problems. In contrast to its deterministic counterpart, PSGD has been found to have difficulties with the correct identification of underlying substructures (such as supports, low rank patterns, or active constraints) and it does not possess a finite-time manifold identification property. Existing solutions rely on convexity assumptions or on the additional usage of variance reduction techniques. In this paper, we address these limitations and present a simple variant of PSGD based on Robinson's normal map. The proposed normal map-based proximal stochastic gradient method (NSGD) is shown to converge globally, i.e., accumulation points of the generated iterates correspond to stationary points almost surely. In addition, we establish complexity bounds for NSGD that match the known results for PSGD and we prove that NSGD can almost surely identify active manifolds in finite-time in a general nonconvex setting. Our derivations are built on almost sure iterate convergence guarantees and utilize analysis techniques based on the Kurdyka-Lojasiewicz inequality.