🤖 AI Summary
Existing gradient-free optimization methods based on sampling lack theoretical guarantees of global convergence for non-convex problems. This work interprets such methods as implicit gradient descent on a smoothed objective function and, drawing inspiration from noise-conditioned score estimation in diffusion models, develops a non-asymptotic convergence analysis framework. By characterizing the landscape properties of the smoothed objective, the study reveals a fundamental trade-off between coverage and optimality, leading to the proposal of DIDA, a novel algorithm employing a dual-annealing mechanism. Theoretically, DIDA is proven to converge to a neighborhood of the global minimizer; empirically, it significantly outperforms existing gradient-free optimization approaches.
📝 Abstract
Sampling-based optimization (SBO), like cross-entropy method and evolutionary algorithms, has achieved many successes in solving non-convex problems without gradients, yet its convergence is poorly understood. In this paper, we establish a non-asymptotic convergence analysis for SBO through the lens of smoothing. Specifically, we recast SBO as gradient descent on a smoothed objective, mirroring noise-conditioned score ascent in diffusion models. Our first contribution is a landscape analysis of the smoothed objective, demonstrating how smoothing helps escape local minima and uncovering a fundamental coverage-optimality trade-off: smoothing renders the landscape more benign by enlarging the locally convex region around the global minimizer, but at the cost of introducing an optimality gap. Building on this insight, we establish non-asymptotic convergence guarantees for SBO algorithms to a neighborhood of the global minimizer. Furthermore, we propose an annealed SBO algorithm, Diffusion-Inspired Dual-Annealing (DIDA), which is provably convergent to the global optimum. We conduct extensive numerical experiments to verify our landscape results and also demonstrate the compelling performance of DIDA compared to other gradient-free optimization methods. Lastly, we discuss implications of our results for diffusion models.