🤖 AI Summary
In predictive policing (PP), model deployment induces data distribution shift, impeding convergence of non-convex optimization to stationary prediction-stable solutions (SPS). Existing stochastic gradient descent with gradient descent (SGD-GD) methods rely on bounded variance assumptions, yielding convergence error bounds that scale with gradient variance—limiting their ability to eliminate residual error. This work proposes SPRINT: the first algorithm achieving an $O(1/T)$ convergence rate for non-convex smooth losses, with error neighborhood independent of stochastic gradient variance—thereby breaking the variance-dependency bottleneck. SPRINT jointly models distribution shift via variance reduction and adaptive stochastic gradient updates. Extensive experiments on multiple real-world datasets demonstrate that SPRINT significantly outperforms SGD-GD, attaining both faster convergence and enhanced training stability.
📝 Abstract
Performative prediction (PP) is an algorithmic framework for optimizing machine learning (ML) models where the model's deployment affects the distribution of the data it is trained on. Compared to traditional ML with fixed data, designing algorithms in PP converging to a stable point -- known as a stationary performative stable (SPS) solution -- is more challenging than the counterpart in conventional ML tasks due to the model-induced distribution shifts. While considerable efforts have been made to find SPS solutions using methods such as repeated gradient descent (RGD) and greedy stochastic gradient descent (SGD-GD), most prior studies assumed a strongly convex loss until a recent work established $mathcal{O}(1/sqrt{T})$ convergence of SGD-GD to SPS solutions under smooth, non-convex losses. However, this latest progress is still based on the restricted bounded variance assumption in stochastic gradient estimates and yields convergence bounds with a non-vanishing error neighborhood that scales with the variance. This limitation motivates us to improve convergence rates and reduce error in stochastic optimization for PP, particularly in non-convex settings. Thus, we propose a new algorithm called stochastic performative prediction with variance reduction (SPRINT) and establish its convergence to an SPS solution at a rate of $mathcal{O}(1/T)$. Notably, the resulting error neighborhood is **independent** of the variance of the stochastic gradients. Experiments on multiple real datasets with non-convex models demonstrate that SPRINT outperforms SGD-GD in both convergence rate and stability.