🤖 AI Summary
Single-round communication in distributed PCA suffers from substantial statistical efficiency loss. Method: This paper proposes a two-round fixed-point iteration algorithm that, under weak local signal-to-noise ratios, achieves estimation accuracy asymptotically matching centralized PCA—incuring only a doubling of communication cost. The method leverages first-order perturbation analysis of the eigensubspace and integrates tools from random matrix theory and distributed optimization, requiring no additional distributional assumptions or strong synchronization mechanisms. Contribution/Results: For the first time, we rigorously establish and eliminate the asymptotic efficiency gap between distributed and centralized PCA under moderate signal-to-noise ratios, providing formal convergence guarantees. Experiments on synthetic and benchmark datasets confirm that, after two rounds, the statistical error converges to the centralized baseline—significantly outperforming existing single-round approaches.
📝 Abstract
We enhance Fan et al.'s (2019) one-round distributed principal component analysis algorithm by adding a second fixed-point iteration round. Random matrix theory reveals the one-round estimator exhibits higher asymptotic error than the pooling estimator under moderate local signal-to-noise ratios. Remarkably, our second iteration round eliminates this efficiency gap. It follows from a careful analysis of the first-order perturbation of eigenspaces. Empirical experiments on synthetic and benchmark datasets consistently demonstrate the two-round method's statistical advantage over the one-round approach.