🤖 AI Summary
Can permutation randomization overcome the local stagnation bottleneck inherent in gradient-based algorithms for dimension-free nonsmooth nonconvex optimization? This paper provides the first theoretical proof that permutation randomization breaks gradient contraction while preserving the original convergence rate, thereby ensuring sustained progress toward global optimality. To this end, we propose a novel dimension-free analytical framework for nonsmooth nonconvex stochastic optimization, integrating nonsmooth analysis, convergence theory, and permutation-based sampling. Our theoretical analysis establishes both convergence guarantees and rate preservation. Empirical evaluation on deep neural network training and noisy objective optimization demonstrates consistent and significant improvements over three state-of-the-art baselines. The core contribution lies in uncovering the “de-stagnation” mechanism of permutation randomization—offering both rigorous theoretical justification and demonstrable practical gains.
📝 Abstract
While gradient-based optimizers that incorporate randomization often showcase superior performance on complex optimization, the theoretical foundations underlying this superiority remain insufficiently understood. A particularly pressing question has emerged: What is the role of randomization in dimension-free nonsmooth nonconvex optimization? To address this gap, we investigate the theoretical and empirical impact of permutation randomization within gradient-based optimization frameworks, using it as a representative case to explore broader implications. From a theoretical perspective, our analyses reveal that permutation randomization disrupts the shrinkage behavior of gradient-based optimizers, facilitating continuous convergence toward the global optimum given a sufficiently large number of iterations. Additionally, we prove that permutation randomization can preserve the convergence rate of the underlying optimizer. On the empirical side, we conduct extensive numerical experiments comparing permutation-randomized optimizer against three baseline methods. These experiments span tasks such as training deep neural networks with stacked architectures and optimizing noisy objective functions. The results not only corroborate our theoretical insights but also highlight the practical benefits of permutation randomization. In summary, this work delivers both rigorous theoretical justification and compelling empirical evidence for the effectiveness of permutation randomization. Our findings and evidence lay a foundation for extending analytics to encompass a wide array of randomization.