🤖 AI Summary
Estimating shared low-dimensional subspaces from multi-view data is challenging due to heterogeneity in signal-to-noise ratio, dimensionality, and structural composition (e.g., individual components), leading to biased subspace estimation. Method: We propose a weighted two-stage spectral algorithm that departs from equal-weight aggregation paradigms (e.g., AJIVE) by introducing a data-driven, adaptive weighting scheme. This enables joint correction of statistical and structural heterogeneity without iterative optimization. Contribution/Results: Theoretically, the algorithm achieves the optimal convergence rate $O(K^{-1/2})$ and rigorously decouples the effects of dual-layer heterogeneity. Empirically, it significantly improves accuracy in recovering shared subspaces on both synthetic benchmarks and real multi-omics datasets. Our approach establishes a more robust and precise paradigm for joint dimensionality reduction of high-dimensional heterogeneous matrix data.
📝 Abstract
Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing"error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.