🤖 AI Summary
Existing parameter-free stochastic optimization methods still rely on prior bounds of problem-specific parameters, hindering truly assumption-free optimization. This work proposes Grasp, a universal framework that integrates self-bounding analysis to automatically determine the parameter search range, thereby achieving the first fully prior-knowledge-free stochastic optimization algorithm. The method enjoys near-optimal convergence guarantees in both non-convex and convex settings: in the non-convex case, it attains the optimal convergence rate up to logarithmic factors, while in the convex case, it simultaneously achieves acceleration and universality. Furthermore, by modeling interpolation variance, the approach provides novel theoretical guarantees for model ensembling, matching the performance of existing methods that require meticulous hyperparameter tuning.
📝 Abstract
Parameter-free stochastic optimization aims to design algorithms that are agnostic to the underlying problem parameters while still achieving convergence rates competitive with optimally tuned methods. While some parameter-free methods do not require the specific values of the problem parameters, they still rely on prior knowledge, such as the lower or upper bounds of them. We refer to such methods as ``partially parameter-free''. In this work, we target achieving ``fully parameter-free'' methods, i.e., the algorithmic inputs do not need to satisfy any unverifiable condition related to the true problem parameters. We propose a powerful and general grid search framework, named \textsc{Grasp}, with a novel self-bounding analysis technique that effectively determines the search ranges of parameters, in contrast to previous work. Our method demonstrates generality in: (i) the non-convex case, where we propose a fully parameter-free method that achieves near-optimal convergence rate, up to logarithmic factors; (ii) the convex case, where our parameter-free methods are competitive with strong performance in terms of acceleration and universality. Finally, we contribute a sharper guarantee for the model ensemble, a final step of the grid search framework, under interpolated variance characterization.