π€ AI Summary
This paper addresses two fundamental limitations in CANDECOMP/PARAFAC (CP) tensor decomposition: the lack of statistical optimality guarantees under noise, non-orthogonality, and high rank, and the weak convergence theory for the alternating least squares (ALS) algorithm. To resolve these, we propose a joint framework integrating the Tensor Alternating Spectral Decomposition (TASD) initialization with ALS. We establish, for the first time, non-asymptotic minimax-optimal error bounds for CP decomposition under general order, dimension, and rank. We further characterize ALSβs two-stage convergence dynamics: quadratic convergence initially, followed by linear refinement; in the rank-one case, only 1β2 iterations suffice to achieve statistical optimality. Experiments demonstrate that TASD+ALS significantly improves estimation stability and accuracy in noisy settings. Our core contribution is a unified theoretical characterization of both the statistical limits and algorithmic convergence rates of CP decomposition, thereby filling a critical gap in high-dimensional tensor decomposition theory.
π Abstract
Canonical Polyadic (CP) tensor decomposition is a fundamental technique for analyzing high-dimensional tensor data. While the Alternating Least Squares (ALS) algorithm is widely used for computing CP decomposition due to its simplicity and empirical success, its theoretical foundation, particularly regarding statistical optimality and convergence behavior, remain underdeveloped, especially in noisy, non-orthogonal, and higher-rank settings. In this work, we revisit CP tensor decomposition from a statistical perspective and provide a comprehensive theoretical analysis of ALS under a signal-plus-noise model. We establish non-asymptotic, minimax-optimal error bounds for tensors of general order, dimensions, and rank, assuming suitable initialization. To enable such initialization, we propose Tucker-based Approximation with Simultaneous Diagonalization (TASD), a robust method that improves stability and accuracy in noisy regimes. Combined with ALS, TASD yields a statistically consistent estimator. We further analyze the convergence dynamics of ALS, identifying a two-phase pattern-initial quadratic convergence followed by linear refinement. We further show that in the rank-one setting, ALS with an appropriately chosen initialization attains optimal error within just one or two iterations.