Adaptive Lipschitz-Free Conditional Gradient Methods for Stochastic Composite Nonconvex Optimization

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of designing projection-free algorithms for stochastic composite nonconvex optimization without requiring a global smoothness constant or line search. It proposes ALFCG, an adaptive Frank-Wolfe-type framework that dynamically estimates local smoothness via a self-normalized cumulative difference of historical gradients and minimizes a quadratic surrogate model at each iteration to accommodate unknown geometric structure. ALFCG is the first such method to achieve adaptivity without global Lipschitz constants or line searches, incorporating variants with SPIDER (ALFCG-FS) and single- or double-batch momentum-based variance reduction (ALFCG-MVR1/MVR2). Theoretically, as the noise variance σ→0, ALFCG attains a near-optimal convergence rate of Õ(ε⁻²), significantly improving upon classical rates of O(ε⁻³) or O(ε⁻⁴). Empirical results on multiclass classification tasks confirm its practical superiority.

Technology Category

Application Category

📝 Abstract
We propose ALFCG (Adaptive Lipschitz-Free Conditional Gradient), the first \textit{adaptive} projection-free framework for stochastic composite nonconvex minimization that \textit{requires neither global smoothness constants nor line search}. Unlike prior conditional gradient methods that use openloop diminishing stepsizes, conservative Lipschitz constants, or costly backtracking, ALFCG maintains a self-normalized accumulator of historical iterate differences to estimate local smoothness and minimize a quadratic surrogate model at each step. This retains the simplicity of Frank-Wolfe while adapting to unknown geometry. We study three variants. ALFCG-FS addresses finite-sum problems with a SPIDER estimator. ALFCG-MVR1 and ALFCG-MVR2 handle stochastic expectation problems by using momentum-based variance reduction with single-batch and two-batch updates, and operate under average and individual smoothness, respectively. To reach an $\epsilon$-stationary point, ALFCG-FS attains $\mathcal{O}(N+\sqrt{N}\epsilon^{-2})$ iteration complexity, while ALFCG-MVR1 and ALFCG-MVR2 achieve $\tilde{\mathcal{O}}(\sigma^2\epsilon^{-4}+\epsilon^{-2})$ and $\tilde{\mathcal{O}}(\sigma\epsilon^{-3}+\epsilon^{-2})$, where $N$ is the number of components and $\sigma$ is the noise level. In contrast to typical $\mathcal{O}(\epsilon^{-4})$ or $\mathcal{O}(\epsilon^{-3})$ rates, our bounds reduce to the optimal rate up to logarithmic factors $\tilde{\mathcal{O}}(\epsilon^{-2})$ as the noise level $\sigma \to 0$. Extensive experiments on multiclass classification over nuclear norm balls and $\ell_p$ balls show that ALFCG generally outperforms state-of-the-art conditional gradient baselines.
Problem

Research questions and friction points this paper is trying to address.

stochastic composite nonconvex optimization
conditional gradient
adaptive methods
Lipschitz-free
projection-free
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive conditional gradient
Lipschitz-free
stochastic nonconvex optimization
variance reduction
projection-free optimization
🔎 Similar Papers
No similar papers found.