🤖 AI Summary
Non-convex, non-smooth composite optimization—ubiquitous in deep learning—poses fundamental challenges for global optimality, as standard methods typically converge only to stationary points rather than globally optimal solutions. Method: To address this, we introduce two novel generalized geometric properties—H(φ)-convexity and H(Φ)-smoothness—that unify and extend classical strong convexity and Lipschitz smoothness, thereby endowing general objective functions with analyzable geometric structure. Building upon this framework, we propose the Gradient Structure Control (GSC) algorithm—the first method provably achieving controllable approximation to the global optimum under non-convex, non-smooth settings. Contribution/Results: We establish that GSC’s optimal convergence rate depends solely on the homogeneity degree of Φ. Empirical evaluation demonstrates substantial improvements in training stability and generalization performance across canonical deep learning tasks.
📝 Abstract
This paper introduces an optimization framework aimed at providing a theoretical foundation for a class of composite optimization problems, particularly those encountered in deep learning. In this framework, we introduce $mathcal{H}(phi)$-convexity and $mathcal{H}(Phi)$-smoothness to generalize the existing concepts of Lipschitz smoothness and strong convexity. Furthermore, we analyze and establish the convergence of both gradient descent and stochastic gradient descent methods for objective functions that are $mathcal{H}(Phi)$-smooth. We prove that the optimal convergence rates of these methods depend solely on the homogeneous degree of $Phi$. Based on these findings, we construct two types of non-convex and non-smooth optimization problems: deterministic composite and stochastic composite optimization problems, which encompass the majority of optimization problems in deep learning. To address these problems, we develop the gradient structure control algorithm and prove that it can locate approximate global optima. This marks a significant departure from traditional non-convex analysis framework, which typically settle for stationary points. Therefore, with the introduction of $mathcal{H}(phi)$-convexity and $mathcal{H}(Phi)$-smoothness, along with the GSC algorithm, the non-convex optimization mechanisms in deep learning can be theoretically explained and supported. Finally, the effectiveness of the proposed framework is substantiated through empirical experimentation.