Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing canonical correlation analysis (CCA) and generalized CCA (GCCA) methods struggle to jointly model nonlinear dependencies, enforce sparse variable selection, and scale to three or more views—limiting their utility in high-dimensional multi-omics data analysis. To address this, we propose HSIC-SGCCA: the first unified framework integrating Hilbert–Schmidt independence criterion (HSIC)-based nonlinear dependence modeling, sparsity-inducing regularization, and multi-view GCCA. We innovatively impose unit-variance constraints on canonical projections and develop a hybrid optimization algorithm combining block-wise proximal linearization with the alternating direction method of multipliers (ADMM). Extensive experiments on synthetic data and real TCGA-BRCA multi-omics data demonstrate that HSIC-SGCCA significantly improves cross-view variable selection accuracy and consistency. It outperforms state-of-the-art two-view extensions in comprehensive performance metrics, establishing a scalable, interpretable paradigm for multi-view biomedical data modeling.

Technology Category

Application Category

📝 Abstract
Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection.
Problem

Research questions and friction points this paper is trying to address.

Integrating nonlinear dependence in multi-view data.
Ensuring sparsity for effective variable selection.
Generalizing canonical correlation analysis to multiple views.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear sparse generalized CCA
Multi-view high-dimensional data analysis
Block prox-linear method integration
🔎 Similar Papers
No similar papers found.
Rong Wu
Rong Wu
Zhejiang University
Z
Ziqi Chen
School of Statistics, KLATASDS-MOE, East China Normal University, Shanghai, China
G
Gen Li
Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
Hai Shu
Hai Shu
Department of Biostatistics, School of Global Public Health, New York University
High dimensional dataneuroimagemachine learning/deep learning