🤖 AI Summary
This paper studies nonsmooth nonconvex finite-sum coupled composite optimization (FCCO), where the outer function is nonsmooth weakly convex (or convex) and the inner functions are smooth (or weakly convex). To address the high iteration complexity ($O(1/varepsilon^6)$) and poor adaptability of existing algorithms to deep learning settings, we propose the first stochastic momentum method for FCCO with rigorous theoretical convergence guarantees. Our approach integrates momentum-based updates, weak convexity analysis, smoothed hinge penalty techniques, and composite gradient estimation, and further extends to computing KKT points under (weakly) convex inequality constraints. We establish that the algorithm achieves an $varepsilon$-KKT point in $O(1/varepsilon^5)$ iterations—the optimal complexity for this problem class—improving upon prior work by one order in convergence rate. Empirical evaluation across three representative tasks confirms both effectiveness and practical applicability.
📝 Abstract
Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of $O(1/epsilon^6)$ under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of $O(1/epsilon^5)$. Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of $O(1/epsilon^5)$ for finding an (nearly) $epsilon$-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.