A Unified Approach to Memory-Sample Tradeoffs for Detecting Planted Structures

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work investigates the trade-offs between memory and sample complexity for multi-pass streaming algorithms in detecting planted structures—such as k-bicliques in graphs—or high-dimensional sparse signals, including sparse Gaussian means and sparse PCA. The authors propose a unified framework that formulates these problems as distribution testing tasks over matrices, leveraging an improved distributed data processing inequality, likelihood ratio analysis, and multi-round information cost techniques. They establish the first memory–sample complexity lower bound for sparse PCA and extend it to graph streaming problems under the vertex-arrival model. In the low-memory regime of $O(\log n)$ space, they obtain nearly tight memory lower bounds, which further imply new multi-pass lower bounds for approximating the size of the maximum biclique and the density of bounded subgraphs.

Technology Category

Application Category

📝 Abstract

We present a unified framework for proving memory lower bounds for multi-pass streaming algorithms that detect planted structures. Planted structures -- such as cliques or bicliques in graphs, and sparse signals in high-dimensional data -- arise in numerous applications, and our framework yields multi-pass memory lower bounds for many such fundamental settings. We show memory lower bounds for the planted $k$-biclique detection problem in random bipartite graphs and for detecting sparse Gaussian means. We also show the first memory-sample tradeoffs for the sparse principal component analysis (PCA) problem in the spiked covariance model. For all these problems to which we apply our unified framework, we obtain bounds which are nearly tight in the low, $O(\log n)$ memory regime. We also leverage our bounds to establish new multi-pass streaming lower bounds, in the vertex arrival model, for two well-studied graph streaming problems: approximating the size of the largest biclique and approximating the maximum density of bounded-size subgraphs. To show these bounds, we study a general distinguishing problem over matrices, where the goal is to distinguish a null distribution from one that plants an outlier distribution over a random submatrix. Our analysis builds on a new distributed data processing inequality that provides sufficient conditions for memory hardness in terms of the likelihood ratio between the averaged planted and null distributions. This result generalizes the inequality of [Braverman et al., STOC 2016] and may be of independent interest. The inequality enables us to measure information cost under the null distribution -- a key step for applying subsequent direct-sum-type arguments and incorporating the multi-pass information cost framework of [Braverman et al., STOC 2024].

Problem

Research questions and friction points this paper is trying to address.

memory-sample tradeoffs

planted structures

streaming algorithms

memory lower bounds

sparse PCA

Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-sample tradeoffs

streaming algorithms

planted structures