๐ค AI Summary
For perfect $L_p$ sampling ($0 < p < 2$) in the turnstile streaming modelโi.e., sampling coordinate $i$ with exact probability $|x_i|^p / |x|_p^p$, up to error $leq n^{-C}$โexisting algorithms suffer from superpolynomial update time $Omega(n^C)$. This work presents the first perfect $L_p$ sampler achieving both optimal space complexity $ ilde{O}(log^2 n)$ and $mathrm{poly}(log n)$ update time. Our method leverages characteristic function simulation of the reciprocal power of truncated exponential random variables, combined with the Gil-Pelaez inversion formula and an improved trapezoidal integration scheme, to efficiently approximate the cumulative distribution function. This approach breaks a long-standing update-time bottleneck, enabling practical deployment in streaming spectral analysis, sparse recovery, and related applications.
๐ Abstract
Perfect $L_p$ sampling in a stream was introduced by Jayaram and Woodruff (FOCS 2018) as a streaming primitive which, given turnstile updates to a vector $x in {- ext{poly}(n), ldots, ext{poly}(n)}^n$, outputs an index $i^* in {1, 2, ldots, n}$ such that the probability of returning index $i$ is exactly [Pr[i^* = i] = frac{|x_i|^p}{|x|_p^p} pm frac{1}{n^C},] where $C > 0$ is an arbitrarily large constant. Jayaram and Woodruff achieved the optimal $ ilde{O}(log^2 n)$ bits of memory for $0 < p < 2$, but their update time is at least $n^C$ per stream update. Thus an important open question is to achieve efficient update time while maintaining optimal space. For $0 < p < 2$, we give the first perfect $L_p$-sampler with the same optimal amount of memory but with only $ ext{poly}(log n)$ update time. Crucial to our result is an efficient simulation of a sum of reciprocals of powers of truncated exponential random variables by approximating its characteristic function, using the Gil-Pelaez inversion formula, and applying variants of the trapezoid formula to quickly approximate it.