🤖 AI Summary
This paper addresses the theoretical disconnect between classical PCA and modern representation learning by introducing a novel perspective grounded in Difference-of-Convex (DC) optimization. Methodologically, it: (i) interprets synchronous iterative PCA algorithms as concrete instances of DC programming; (ii) establishes a rigorous theoretical link between PCA and self-attention mechanisms; (iii) formulates kernel PCA within the DC framework, enabling out-of-sample extension; and (iv) derives the dual formulation of ℓ₁-norm-based robust PCA. The contributions are threefold: (i) unifying the optimization principles underlying QR iteration, kernel methods, and self-attention; (ii) yielding a family of new PCA variants that are kernelizable, scalable, and robust; and (iii) providing rigorous theoretical analysis and empirical validation—numerical experiments demonstrate competitive or superior performance versus state-of-the-art methods. Collectively, this work furnishes a principled, interpretable, and generalizable optimization framework for classical dimensionality reduction.
📝 Abstract
Motivated by the recently shown connection between self-attention and (kernel) principal component analysis (PCA), we revisit the fundamentals of PCA. Using the difference-of-convex (DC) framework, we present several novel formulations and provide new theoretical insights. In particular, we show the kernelizability and out-of-sample applicability for a PCA-like family of problems. Moreover, we uncover that simultaneous iteration, which is connected to the classical QR algorithm, is an instance of the difference-of-convex algorithm (DCA), offering an optimization perspective on this longstanding method. Further, we describe new algorithms for PCA and empirically compare them with state-of-the-art methods. Lastly, we introduce a kernelizable dual formulation for a robust variant of PCA that minimizes the $l_1$ deviation of the reconstruction errors.