🤖 AI Summary
Conventional PCA disregards response variables, while supervised PCA (SPCA) struggles to simultaneously ensure projection directions’ informativeness with respect to the response and the interpretability of resulting principal components.
Method: We propose Covariance–Variance Joint Principal Component Analysis (CSPCA), the first SPCA framework yielding a closed-form solution that jointly optimizes projections for maximal covariance with the response variable and maximal variance explained by the components—bypassing manifold optimization to enhance numerical stability and reproducibility. CSPCA employs a regularized objective function solved via eigen-decomposition of a structured matrix; Nyström approximation is incorporated to accelerate computation in high-dimensional settings.
Results: Extensive experiments on synthetic and real-world datasets demonstrate that CSPCA significantly outperforms existing SPCA methods across multiple criteria—including predictive accuracy, component interpretability, and computational efficiency—while maintaining theoretical rigor and practical scalability.
📝 Abstract
Principal component analysis (PCA) is a widely used unsupervised dimensionality reduction technique in machine learning, applied across various fields such as bioinformatics, computer vision and finance. However, when the response variables are available, PCA does not guarantee that the derived principal components are informative to the response variables. Supervised PCA (SPCA) methods address this limitation by incorporating response variables into the learning process, typically through an objective function similar to PCA. Existing SPCA methods do not adequately address the challenge of deriving projections that are both interpretable and informative with respect to the response variable. The only existing approach attempting to overcome this, relies on a mathematically complicated manifold optimization scheme, sensitive to hyperparameter tuning. We propose covariance-supervised principal component analysis (CSPCA), a novel SPCA method that projects data into a lower-dimensional space by balancing (1) covariance between projections and responses and (2) explained variance, controlled via a regularization parameter. The projection matrix is derived through a closed-form solution in the form of a simple eigenvalue decomposition. To enhance computational efficiency for high-dimensional datasets, we extend CSPCA using the standard Nyström method. Simulations and real-world applications demonstrate that CSPCA achieves strong performance across numerous performance metrics.