🤖 AI Summary
This work investigates the minimum mean separation $\Delta$ required for partial recovery of cluster structure in isotropic Gaussian mixture models in the moderate-dimensional regime ($n \geq dK$), aiming to bridge the gap between information-theoretic limits and the performance of polynomial-time algorithms. By establishing the first low-degree polynomial computational lower bound in this regime, the study reveals that the clustering hardness stems from nonparametric effects rather than dimensionality reduction bottlenecks. Leveraging this insight, the authors design a novel non-spectral clustering algorithm that matches the derived nonparametric rate and achieves the computational lower bound. This result precisely characterizes the statistical-computational trade-off in moderate-dimensional clustering and advances the theoretical understanding of its computational complexity.
📝 Abstract
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.