Stochastic Difference-of-Convex Optimization with Momentum

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing stochastic difference-of-convex (DC) optimization methods lack convergence guarantees under mini-batch settings, typically requiring either large batch sizes or strong noise assumptions. Method: This paper proposes a novel momentum-based stochastic DC optimization framework, designed under standard smoothness and bounded gradient variance assumptions. It introduces momentum-driven gradient updates to mitigate variance amplification inherent in small-batch sampling. Contribution/Results: We establish the first provably convergent algorithm for stochastic DC optimization that accommodates arbitrary batch sizes—without relying on large batches or restrictive noise conditions. Theoretical analysis demonstrates that momentum effectively suppresses mini-batch-induced gradient variance, ensuring convergence to a stationary point. Empirical evaluations confirm substantial improvements in both convergence speed and stability over state-of-the-art baselines. This work provides the first general, robust, and theoretically grounded momentum framework for mini-batch stochastic DC optimization.

Technology Category

Application Category

📝 Abstract

Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.

Problem

Research questions and friction points this paper is trying to address.

Optimizing stochastic difference-of-convex functions with small batches

Addressing convergence failure without momentum in DC optimization

Enabling convergence under standard assumptions for any batch size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Momentum enables convergence for any batch size

Proves convergence failure without momentum regardless of stepsize

Algorithm achieves provable convergence with strong empirical performance

🔎 Similar Papers

Convergence of SGD with momentum in the nonconvex case: A novel time window-based analysis