🤖 AI Summary
This work addresses the challenge of local minima arising from non-convex parametrization in continuous sparse regression problems, such as Beurling LASSO, by proposing the Fast Spawn & Prune algorithm. The method integrates conic particle gradient descent with a birth-death stochastic process, dynamically introducing new particles in regions violating first-order optimality conditions to explore the global solution space while pruning redundant particles to preserve computational efficiency. Notably, it provides the first global convergence guarantee for a discrete-time stochastic algorithm without requiring exponentially many initializations. The analysis yields an explicit excess risk rate of 𝒪((log K / K)^{1/(2(2+d))}) and a sample complexity of 𝒪(N^{-1/(4(2+d))}), up to logarithmic factors, and further supports adaptive stopping criteria.
📝 Abstract
We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trapped in local minima due to the non-convexity of the parameterization. To overcome this limitation, we introduce Fast Spawn\&Prune (FS\&P), a stochastic algorithm that extends FastPart introduced in De Castro et al. (2025) and combines CPGD with a birth-death process. The birth mechanism ensures asymptotic global exploration by introducing particles in regions where first-order optimality conditions are violated, while the death process preserves computational efficiency by pruning non-informative particles. We provide the first theoretical guarantee of global convergence for this class of discrete-time stochastic algorithms, without requiring exponentially large initializations. Furthermore, we derive explicit convergence rates for the excess risk, which scale as $\mathcal{O}\big(\left(\log K / K\right)^{\frac{1}{2(2+d)}}\big)$, where $K$ denotes the number of iterations and d the dimension of the domain, thereby quantifying the trade-off between global exploration and local refinement. Moreover, the sample complexity is $\mathcal{O}\big(N^{-\frac{1}{4(2+d)}}\big)$ (up to logarithmic factors). We also propose a horizon-free variant that does not require prior knowledge of the iteration budget.