HeteroJIVE: Joint Subspace Estimation for Heterogeneous Multi-View Data

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating shared low-dimensional subspaces from multi-view data is challenging due to heterogeneity in signal-to-noise ratio, dimensionality, and structural composition (e.g., individual components), leading to biased subspace estimation. Method: We propose a weighted two-stage spectral algorithm that departs from equal-weight aggregation paradigms (e.g., AJIVE) by introducing a data-driven, adaptive weighting scheme. This enables joint correction of statistical and structural heterogeneity without iterative optimization. Contribution/Results: Theoretically, the algorithm achieves the optimal convergence rate $O(K^{-1/2})$ and rigorously decouples the effects of dual-layer heterogeneity. Empirically, it significantly improves accuracy in recovering shared subspaces on both synthetic benchmarks and real multi-omics datasets. Our approach establishes a more robust and precise paradigm for joint dimensionality reduction of high-dimensional heterogeneous matrix data.

Technology Category

Application Category

📝 Abstract
Many modern datasets consist of multiple related matrices measured on a common set of units, where the goal is to recover the shared low-dimensional subspace. While the Angle-based Joint and Individual Variation Explained (AJIVE) framework provides a solution, it relies on equal-weight aggregation, which can be strictly suboptimal when views exhibit significant statistical heterogeneity (arising from varying SNR and dimensions) and structural heterogeneity (arising from individual components). In this paper, we propose HeteroJIVE, a weighted two-stage spectral algorithm tailored to such heterogeneity. Theoretically, we first revisit the ``non-diminishing"error barrier with respect to the number of views $K$ identified in recent literature for the equal-weight case. We demonstrate that this barrier is not universal: under generic geometric conditions, the bias term vanishes and our estimator achieves the $O(K^{-1/2})$ rate without the need for iterative refinement. Extending this to the general-weight case, we establish error bounds that explicitly disentangle the two layers of heterogeneity. Based on this, we derive an oracle-optimal weighting scheme implemented via a data-driven procedure. Extensive simulations corroborate our theoretical findings, and an application to TCGA-BRCA multi-omics data validates the superiority of HeteroJIVE in practice.
Problem

Research questions and friction points this paper is trying to address.

Develop weighted spectral algorithm for heterogeneous multi-view data
Overcome suboptimal equal-weight aggregation in subspace estimation
Address statistical and structural heterogeneity in multi-view datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weighted two-stage spectral algorithm for heterogeneous data
Oracle-optimal weighting via data-driven procedure
Achieves O(K^{-1/2}) rate without iterative refinement
🔎 Similar Papers
No similar papers found.