🤖 AI Summary
This paper addresses the problem of detecting latent structures—such as communities or principal submatrices—in large symmetric data matrices. We propose a parameter-free, distribution-free, and outlier-robust spectral testing method. Our core methodological innovation is the first systematic construction and analysis of a Wilcoxon–Wigner random matrix framework, which replaces the conventional sample covariance matrix with nonparametric rank-based statistics, thereby eliminating dependence on distributional assumptions or moment conditions. Theoretically, we rigorously establish asymptotic normality for the leading eigenvalue and eigenvector, deriving explicit centering and scaling that yield Gaussian limiting distributions. Practically, the framework enables robust, efficient, and distribution-agnostic hypothesis testing for community detection and principal submatrix localization. This work provides both a novel theoretical foundation and a practical tool for structural inference in high-dimensional symmetric matrices.
📝 Abstract
This paper considers the problem of testing for latent structure in large symmetric data matrices. The goal here is to develop statistically principled methodology that is flexible in its applicability, computationally efficient, and insensitive to extreme data variation, thereby overcoming limitations facing existing approaches. To do so, we introduce and systematically study certain symmetric matrices, called Wilcoxon--Wigner random matrices, whose entries are normalized rank statistics derived from an underlying independent and identically distributed sample of absolutely continuous random variables. These matrices naturally arise as the matricization of one-sample problems in statistics and conceptually lie at the interface of nonparametrics, multivariate analysis, and data reduction. Among our results, we establish that the leading eigenvalue and corresponding eigenvector of Wilcoxon--Wigner random matrices admit asymptotically Gaussian fluctuations with explicit centering and scaling terms. These asymptotic results enable rigorous parameter-free and distribution-free spectral methodology for addressing two hypothesis testing problems, namely community detection and principal submatrix detection. Numerical examples illustrate the performance of the proposed approach. Throughout, our findings are juxtaposed with existing results based on the spectral properties of independent entry symmetric random matrices in signal-plus-noise data settings.