🤖 AI Summary
This work investigates the impact of substituting the true probability matrix with its estimator in the stochastic block model, which introduces both simple and compound perturbations that affect the asymptotic behavior of spectral statistics. The authors develop a unified decomposition framework that precisely disentangles the compound perturbation into three components: the bias of the normalized adjacency matrix, the simple perturbation, and the bias inherent to the compound perturbation itself, thereby revealing distinct asymptotic behaviors of cross terms under the two types of perturbations. Leveraging spectral analysis, matrix perturbation theory, and trace techniques, they establish, for the first time, the asymptotic normality of linear spectral statistics under an optimal growth condition on the number of communities, relaxing the previous requirement from \(K = O(n^{1/6 - \tau})\) to the sharp rate \(K = o(n^{1/6})\).
📝 Abstract
Statistical inference for stochastic block models typically relies on the spectrum of the normalized adjacency matrix $\A^*$. In practice, the true probability matrix $\mathbf{B}$ is unknown and must be replaced by a plug-in estimator $\hat{\mathbf{B}}$. This substitution introduces two distinct types of estimation error: a simple perturbation $\boldsymbolΔ$, arising when $\hat{\mathbf{B}}$ replaces $\mathbf{B}$ only in the numerator, and a composite perturbation $\tilde{\boldsymbolΔ}$, arising when the replacement occurs in both the numerator and the denominator.
Under both perturbation regimes, we decompose the total sum of squares into three components and conduct a detailed analysis of their asymptotic properties. This reveals a key, and perhaps surprising, distinction between simple and composite perturbations: the cross term $\tr({\A^*}\bDelta)$ is asymptotically negligible, whereas its composite counterpart $\tr({\A^*}\tilde{\bDelta})$ is not.
Motivated by this, we develop a unified decomposition framework, expressing the composite perturbation matrix as $\tilde{\bDelta}=\check{\A}+\bDelta+\check{\bDelta}$, where $\check{\A}$ is a bias matrix of the normalized adjacency matrix, $\bDelta$ is the simple perturbation, and $\check{\bDelta}$ is a bias matrix of $\bDelta$. This structured decomposition allows us to precisely isolate and control each source of error, leading to a refined limiting theory for two key classes of test statistics.
Concretely, for the largest eigenvalue statistic, we improve the existing condition from $K=O(n^{1/6-τ})$ to the optimal rate $K=o(n^{1/6})$ under both simple and composite perturbations. For the linear spectral statistic, our unified decomposition framework provides the necessary structure to systematically control these errors term by term, leading to a complete and rigorous proof of asymptotic normality.