🤖 AI Summary
Existing sublinear spectral clustering oracles require Ω(√n) memory, limiting their applicability in large-scale or memory-constrained settings. This work proposes a novel spectral clustering oracle tailored for graphs exhibiting strong cluster structure—specifically, those with a logarithmic conductance gap—that reduces memory usage to significantly below O(√n) (e.g., O(n^0.01)) while preserving sublinear query time. The method establishes, for the first time, an approximately optimal trade-off between memory S and query time T, characterized by S·T = Õ(n), thereby overcoming the memory bottleneck inherent in prior oracles. Theoretical analysis confirms the near-optimality of this trade-off, and empirical evaluations demonstrate that the oracle remains highly efficient even under stringent memory constraints.
📝 Abstract
We study the problem of designing \emph{sublinear spectral clustering oracles} for well-clusterable graphs. Such an oracle is an algorithm that, given query access to the adjacency list of a graph $G$, first constructs a compact data structure $\mathcal{D}$ that captures the clustering structure of $G$. Once built, $\mathcal{D}$ enables sublinear time responses to \textsc{WhichCluster}$(G,x)$ queries for any vertex $x$. A major limitation of existing oracles is that constructing $\mathcal{D}$ requires $Ω(\sqrt{n})$ memory, which becomes a bottleneck for massive graphs and memory-limited settings. In this paper, we break this barrier and establish a memory-time trade-off for sublinear spectral clustering oracles. Specifically, for well-clusterable graphs, we present oracles that construct $\mathcal{D}$ using much smaller than $O(\sqrt{n})$ memory (e.g., $O(n^{0.01})$) while still answering membership queries in sublinear time. We also characterize the trade-off frontier between memory usage $S$ and query time $T$, showing, for example, that $S\cdot T=\widetilde{O}(n)$ for clusterable graphs with a logarithmic conductance gap, and we show that this trade-off is nearly optimal (up to logarithmic factors) for a natural class of approaches. Finally, to complement our theory, we validate the performance of our oracles through experiments on synthetic networks.