🤖 AI Summary
This work addresses the problem of constructing small-sized coresets for ℓₚ subspace embeddings with deterministic guarantees for any \( p \in [1, \infty) \). It proposes the first iterative reweighted sampling algorithm that carefully controls upper and lower bounds on the loss function at each iteration to construct a weighted subset of rows as a coreset. The method achieves the first deterministic ℓₚ coreset whose size is free of logarithmic factors, thereby resolving a long-standing open question regarding the unavoidable dependence on log terms in coreset size. The resulting ε-coreset has size \( O(d^{\max\{1,p/2\}}/\varepsilon^2) \) and can be computed in \( O(\mathrm{poly}(n,d,\varepsilon^{-1})) \) time, enabling direct use in deterministic approximation algorithms for ℓₚ regression while matching the theoretical lower bound on coreset size.
📝 Abstract
We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon>0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}'\in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.