🤖 AI Summary
This work addresses the limitations of classical Frank-Wolfe algorithms in constrained non-convex optimization—namely, slow convergence, reliance on prior knowledge of the Lipschitz constant, and poor adaptability to stochastic settings—by introducing an adaptive step-size strategy that eliminates the need for Lipschitz constant estimation. The proposed method extends accelerated Frank-Wolfe algorithms to stochastic and non-convex regimes, establishing, for the first time, convergence guarantees for Boosted Frank-Wolfe under stochastic, non-convex, and quasar-convex settings. It is compatible with a broad class of gradient estimators, including SAGA, L-SVRG, SAG, Heavy Ball momentum, and zeroth-order oracles. Empirical evaluations on sparse logistic regression and quantum process tomography demonstrate that the algorithm significantly outperforms non-accelerated baselines in both per-gradient-call efficiency and wall-clock runtime.
📝 Abstract
The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with step sizes that require either line search or knowledge of the Lipschitz constant of the gradient. We develop a novel step size strategy that does not depend on the Lipschitz constant of the gradient, which allows us to extend the boosted Frank-Wolfe algorithm to the stochastic setting. We prove that boosting with this step size strategy can be combined with many modern gradient estimators, including SAGA, L-SVRG, SAG, Heavy Ball momentum, and zeroth-order estimators, among others, while retaining the worst-case convergence rates of ordinary stochastic Frank-Wolfe. Our analysis also yields the first convergence rates for boosted Frank-Wolfe on nonconvex and quasar-convex objectives, results which are new even for deterministic problems. Experiments on sparse logistic regression and quantum process tomography show that stochastic boosted Frank-Wolfe achieves faster convergence per gradient oracle call (and on wall-clock) compared to the non-boosted baseline.