🤖 AI Summary
This work addresses the challenge of solving high-order (order $k \geq 4$ and even) asymmetric tensor principal component analysis (ATPCA) under stringent memory constraints. The authors propose a three-stage alternating update algorithm based on matrix parametrization that requires only $d^2$ memory, substantially improving upon conventional methods which demand at least $d^{\lceil k/2 \rceil}$ memory. This is the first ATPCA algorithm whose memory complexity is independent of $d^k$, rendering it feasible for large-scale settings. By incorporating mild over-parameterization with stochastic gradient descent, the method achieves a near-optimal sample complexity of $d^{k-2}$ in general cases. Notably, when the signal vectors become increasingly aligned, the algorithm adaptively attains the current best-known polynomial-time sample complexity of $d^{k/2}$.
📝 Abstract
Asymmetric Tensor PCA (ATPCA) is a prototypical model for studying the trade-offs between sample complexity, computation, and memory. Existing algorithms for this problem typically require at least $d^{\left\lceil\overline{k}/2\right\rceil}$ state memory cost to recover the signal, where $d$ is the vector dimension and $\overline{k}$ is the tensor order. We focus on the setting where $\overline{k} \geq 4$ is even and consider (stochastic) gradient descent-based algorithms under a limited memory budget, which permits only mild over-parameterization of the model. We propose a matrix-parameterized method (in $d^{2}$ state memory cost) using a novel three-phase alternating-update algorithm to address the problem and demonstrate how mild over-parameterization facilitates learning in two key aspects: (i) it improves sample efficiency, allowing our method to achieve \emph{near-optimal} $d^{\overline{k}-2}$ sample complexity in our limited memory setting; and (ii) it enhances adaptivity to problem structure, a previously unrecognized phenomenon, where the required sample size naturally decreases as consecutive vectors become more aligned, and in the symmetric limit attains $d^{\overline{k}/2}$, matching the \emph{best} known polynomial-time complexity. To our knowledge, this is the \emph{first} tractable algorithm for ATPCA with $d^{\overline{k}}$-independent memory costs.