🤖 AI Summary
This work addresses the challenge of efficiently training low-rank functional tree tensor networks under non-least-squares losses, such as multinomial logistic regression, where alternating optimization struggles. The study introduces, for the first time, Amari’s natural gradient into functional tensor network learning and proposes a Riemannian gradient descent method applicable to arbitrary loss functions. The resulting search direction is invariant to the choice of basis in the function tensor product space, exhibits geometric invariance, and seamlessly integrates both factorized and manifold-based modeling paradigms. To manage computational complexity, the authors devise a multi-level efficient gradient approximation strategy. Experiments on standard classification benchmarks demonstrate that the proposed method significantly outperforms conventional Riemannian gradient approaches, achieving faster convergence and confirming both theoretical soundness and algorithmic efficacy.
📝 Abstract
We consider machine learning tasks with low-rank functional tree tensor networks (TTN) as the learning model. While in the case of least-squares regression, low-rank functional TTNs can be efficiently optimized using alternating optimization, this is not directly possible in other problems, such as multinomial logistic regression. We propose a natural Riemannian gradient descent type approach applicable to arbitrary losses which is based on the natural gradient by Amari. In particular, the search direction obtained by the natural gradient is independent of the choice of basis of the underlying functional tensor product space. Our framework applies to both the factorized and manifold-based approach for representing the functional TTN. For practical application, we propose a hierarchy of efficient approximations to the true natural Riemannian gradient for computing the updates in the parameter space. Numerical experiments confirm our theoretical findings on common classification datasets and show that using natural Riemannian gradient descent for learning considerably improves convergence behavior when compared to standard Riemannian gradient methods.