🤖 AI Summary
This paper studies empirical risk minimization (ERM) over a compact convex set subject to linear equality constraints (A^ op x = b): (min langle c, x
angle). For large-scale instances, we propose a novel algorithm integrating interior-point methods, dynamic spectral sparsification, and adaptive leverage score estimation. Our key contribution is an efficient data structure that dynamically maintains an upper bound on leverage scores, enabling fast row-update–compatible spectral sparsification. This yields convergence in (O(sqrt{n})) iterations. The total computational complexity is (O(nd + d^6 sqrt{n})); when (A) is dense and (n geq d^{10}), the algorithm achieves near-linear time—substantially improving upon prior state-of-the-art methods. Experiments demonstrate both theoretical guarantees and practical speedups on high-dimensional constrained ERM tasks.
📝 Abstract
Consider the empirical risk minimization (ERM) problem, which is stated as follows. Let $K_1, dots, K_m$ be compact convex sets with $K_i subseteq mathbb{R}^{n_i}$ for $i in [m]$, $n = sum_{i=1}^m n_i$, and $n_ile C_K$ for some absolute constant $C_K$. Also, consider a matrix $A in mathbb{R}^{n imes d}$ and vectors $b in mathbb{R}^d$ and $c in mathbb{R}^n$. Then the ERM problem asks to find [ min_{substack{x in K_1 imes dots imes K_m\ A^ op x = b}}
c^ op x. ] We give an algorithm to solve this to high accuracy in time $widetilde{O}(nd + d^6sqrt{n}) le widetilde{O} (nd + d^{11})$, which is nearly-linear time in the input size when $A$ is dense and $n ge d^{10}$.
Our result is achieved by implementing an $widetilde{O}(sqrt{n})$-iteration interior point method (IPM) efficiently using dynamic data structures. In this direction, our key technical advance is a new algorithm for maintaining leverage score overestimates of matrices undergoing row updates. Formally, given a matrix $A in mathbb{R}^{n imes d}$ undergoing $T$ batches of row updates of total size $n$ we give an algorithm which can maintain leverage score overestimates of the rows of $A$ summing to $widetilde{O}(d)$ in total time $widetilde{O}(nd + Td^6)$. This data structure is used to sample a spectral sparsifier within a robust IPM framework to establish the main result.