Stochastic Difference-of-Convex Optimization with Momentum

๐Ÿ“… 2025-10-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing stochastic difference-of-convex (DC) optimization methods lack convergence guarantees under mini-batch settings, typically requiring either large batch sizes or strong noise assumptions. Method: This paper proposes a novel momentum-based stochastic DC optimization framework, designed under standard smoothness and bounded gradient variance assumptions. It introduces momentum-driven gradient updates to mitigate variance amplification inherent in small-batch sampling. Contribution/Results: We establish the first provably convergent algorithm for stochastic DC optimization that accommodates arbitrary batch sizesโ€”without relying on large batches or restrictive noise conditions. Theoretical analysis demonstrates that momentum effectively suppresses mini-batch-induced gradient variance, ensuring convergence to a stationary point. Empirical evaluations confirm substantial improvements in both convergence speed and stability over state-of-the-art baselines. This work provides the first general, robust, and theoretically grounded momentum framework for mini-batch stochastic DC optimization.

Technology Category

Application Category

๐Ÿ“ Abstract
Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.
Problem

Research questions and friction points this paper is trying to address.

Optimizing stochastic difference-of-convex functions with small batches
Addressing convergence failure without momentum in DC optimization
Enabling convergence under standard assumptions for any batch size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Momentum enables convergence for any batch size
Proves convergence failure without momentum regardless of stepsize
Algorithm achieves provable convergence with strong empirical performance
๐Ÿ”Ž Similar Papers
No similar papers found.
E
El Mahdi Chayti
Machine Learning and Optimization Laboratory (MLO), EPFL
Martin Jaggi
Martin Jaggi
EPFL
Machine LearningOptimization