🤖 AI Summary
Existing canonical correlation analysis (CCA) and generalized CCA (GCCA) methods struggle to jointly model nonlinear dependencies, enforce sparse variable selection, and scale to three or more views—limiting their utility in high-dimensional multi-omics data analysis. To address this, we propose HSIC-SGCCA: the first unified framework integrating Hilbert–Schmidt independence criterion (HSIC)-based nonlinear dependence modeling, sparsity-inducing regularization, and multi-view GCCA. We innovatively impose unit-variance constraints on canonical projections and develop a hybrid optimization algorithm combining block-wise proximal linearization with the alternating direction method of multipliers (ADMM). Extensive experiments on synthetic data and real TCGA-BRCA multi-omics data demonstrate that HSIC-SGCCA significantly improves cross-view variable selection accuracy and consistency. It outperforms state-of-the-art two-view extensions in comprehensive performance metrics, establishing a scalable, interpretable paradigm for multi-view biomedical data modeling.
📝 Abstract
Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection.