🤖 AI Summary
High-dimensional canonical correlation analysis (CCA) often fails due to the curse of dimensionality, and existing sparse CCA methods struggle to simultaneously ensure computational efficiency and statistical rigor. To address this, we propose ECCAR—a fast, theoretically grounded sparse CCA algorithm. Our method recasts CCA as a high-dimensional low-rank regression problem, bypassing Fantope projection entirely; instead, it jointly optimizes sparse regularization and low-rank constraints within a projection-free, scalable framework. We establish that the resulting estimator achieves the minimax-optimal estimation error rate with high probability under mild conditions. Empirical evaluations demonstrate that ECCAR significantly outperforms state-of-the-art methods in runtime while consistently identifying robust, interpretable cross-domain associations—particularly in multi-omics integration and interpretable machine learning tasks. An open-source R package implementing ECCAR is publicly available.
📝 Abstract
In high-dimensional settings, Canonical Correlation Analysis (CCA) often fails, and existing sparse methods force an untenable choice between computational speed and statistical rigor. This work introduces a fast and provably consistent sparse CCA algorithm (ECCAR) that resolves this trade-off. We formulate CCA as a high-dimensional reduced-rank regression problem, which allows us to derive consistent estimators with high-probability error bounds without relying on computationally expensive techniques like Fantope projections. The resulting algorithm is scalable, projection-free, and significantly faster than its competitors. We validate our method through extensive simulations and demonstrate its power to uncover reliable and interpretable associations in two complex biological datasets, as well as in an ML interpretability task. Our work makes sparse CCA a practical and trustworthy tool for large-scale multimodal data analysis. A companion R package has been made available.