Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

📅 2024-05-27
🏛️ International Conference on Machine Learning
📈 Citations: 1
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
For zeroth-order optimization of high-dimensional sparse gradient functions, conventional methods suffer from query complexity scaling linearly or logarithmically with dimension $d$, resulting in poor efficiency. This paper proposes a novel nonlinear gradient estimation framework grounded in compressed sensing, enhancing the Indyk–Price–Woodruff (IPW) algorithm via dependency-aware random partitioning and adaptive grouping. Under milder assumptions, our method achieves, for the first time, doubly logarithmic dependence of query complexity on dimension—$O(s log log(d/s))$—and reduces the IPW constant by approximately 4300×. Theoretically, it attains the optimal convergence rate $O(1/T)$. Empirically, it significantly outperforms twelve state-of-the-art zeroth-order optimizers on benchmark functions in up to $10^4$ dimensions, demonstrating superior practical performance.

Technology Category

Application Category

📝 Abstract
We study nonconvex zeroth-order optimization (ZOO) in a high-dimensional space $mathbb R^d$ for functions with approximately $s$-sparse gradients. To reduce the dependence on the dimensionality $d$ in the query complexity, high-dimensional ZOO methods seek to leverage gradient sparsity to design gradient estimators. The previous best method needs $Oig(slogfrac dsig)$ queries per step to achieve $Oig(frac1Tig)$ rate of convergence w.r.t. the number T of steps. In this paper, we propose *Gradient Compressed Sensing* (GraCe), a query-efficient and accurate estimator for sparse gradients that uses only $Oig(sloglogfrac dsig)$ queries per step and still achieves $Oig(frac1Tig)$ rate of convergence. To our best knowledge, we are the first to achieve a *double-logarithmic* dependence on $d$ in the query complexity under weaker assumptions. Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions. Furthermore, since the IPW algorithm is purely theoretical due to its impractically large constant, we improve the IPW algorithm via our *dependent random partition* technique together with our corresponding novel analysis and successfully reduce the constant by a factor of nearly 4300. Our GraCe is not only theoretically query-efficient but also achieves strong empirical performance. We benchmark our GraCe against 12 existing ZOO methods with 10000-dimensional functions and demonstrate that GraCe significantly outperforms existing methods.
Problem

Research questions and friction points this paper is trying to address.

High-dimensional Space
Zero-order Optimization
Sparse Gradient
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Compressive Sensing
Zeroth-order Optimization
Query Efficiency