🤖 AI Summary
This work studies stochastic nonconvex nonsmooth composite optimization without smoothness assumptions—such as Lipschitz continuity of gradients—arising in applications like regularized ReLU neural networks and sparse support matrix machines. To address the lack of theoretical guarantees for existing zeroth-order algorithms, we propose two novel definitions of approximate stationarity and establish, for the first time, finite-time convergence rates for zeroth-order stochastic methods under purely nonsmooth nonconvex settings. Methodologically, our approach integrates stochastic difference estimation with composite-structure decoupling, requiring no gradient information whatsoever. Numerical experiments demonstrate the efficacy and robustness of the proposed algorithms on real-world machine learning tasks. This work provides both new theoretical foundations and practical tools for black-box optimization of nondifferentiable deep models.
📝 Abstract
This work aims to solve a stochastic nonconvex nonsmooth composite optimization problem. Previous works on composite optimization problem requires the major part to satisfy Lipschitz smoothness or some relaxed smoothness conditions, which excludes some machine learning examples such as regularized ReLU network and sparse support matrix machine. In this work, we focus on stochastic nonconvex composite optimization problem without any smoothness assumptions. In particular, we propose two new notions of approximate stationary points for such optimization problem and obtain finite-time convergence results of two zeroth-order algorithms to these two approximate stationary points respectively. Finally, we demonstrate that these algorithms are effective using numerical experiments.