🤖 AI Summary
This paper addresses the inefficiency of uniform gradient sampling in stochastic gradient descent (SGD). We propose Strategic Gradient Querying (SGQ), a method that, at each iteration, dynamically selects the gradient sample most likely to reduce the objective function—guided by the Expected Improvement criterion—rather than sampling uniformly. Under standard Polyak–Łojasiewicz and smoothness assumptions, theoretical analysis shows SGQ achieves faster transient convergence and tighter steady-state variance control than vanilla SGD; we further introduce an idealized Oracle query as a performance upper bound. Crucially, SGQ requires only a single gradient evaluation per iteration, ensuring practicality. Extensive experiments across diverse optimization tasks demonstrate that SGQ significantly accelerates convergence and consistently reduces gradient estimation variance compared to baseline SGD and several adaptive variants.
📝 Abstract
This paper considers a finite-sum optimization problem under first-order queries and investigates the benefits of strategic querying on stochastic gradient-based methods compared to uniform querying strategy. We first introduce Oracle Gradient Querying (OGQ), an idealized algorithm that selects one user's gradient yielding the largest possible expected improvement (EI) at each step. However, OGQ assumes oracle access to the gradients of all users to make such a selection, which is impractical in real-world scenarios. To address this limitation, we propose Strategic Gradient Querying (SGQ), a practical algorithm that has better transient-state performance than SGD while making only one query per iteration. For smooth objective functions satisfying the Polyak-Lojasiewicz condition, we show that under the assumption of EI heterogeneity, OGQ enhances transient-state performance and reduces steady-state variance, while SGQ improves transient-state performance over SGD. Our numerical experiments validate our theoretical findings.