🤖 AI Summary
High-order tensor-to-tensor (ToT) regression suffers from the “curse of dimensionality,” leading to explosive storage requirements, prohibitive computational cost, and a gap between theoretical analysis and practical implementation.
Method: This paper establishes, for the first time, a statistical theory for ToT regression under tensor train (TT) decomposition. We propose two provably convergent algorithms—iterative hard thresholding (IHT) and Riemannian gradient descent (RGD)—both equipped with theoretical guarantees under a restricted isometry property (RIP) condition. To enhance efficiency, we integrate TT-SVD and spectral initialization strategies.
Contributions/Results: We derive tight upper bounds on estimation error and matching minimax lower bounds, revealing polynomial dependence on the total order of input/output tensors. Under RIP, both IHT and RGD achieve linear convergence, and their final estimation accuracy attains the minimax optimal rate. The proposed initialization schemes significantly reduce sample complexity and memory footprint, enabling scalable and efficient ToT regression.
📝 Abstract
Recently, a tensor-on-tensor (ToT) regression model has been proposed to generalize tensor recovery, encompassing scenarios like scalar-on-tensor regression and tensor-on-vector regression. However, the exponential growth in tensor complexity poses challenges for storage and computation in ToT regression. To overcome this hurdle, tensor decompositions have been introduced, with the tensor train (TT)-based ToT model proving efficient in practice due to reduced memory requirements, enhanced computational efficiency, and decreased sampling complexity. Despite these practical benefits, a disparity exists between theoretical analysis and real-world performance. In this paper, we delve into the theoretical and algorithmic aspects of the TT-based ToT regression model. Assuming the regression operator satisfies the restricted isometry property (RIP), we conduct an error analysis for the solution to a constrained least-squares optimization problem. This analysis includes upper error bound and minimax lower bound, revealing that such error bounds polynomially depend on the order $N+M$. To efficiently find solutions meeting such error bounds, we propose two optimization algorithms: the iterative hard thresholding (IHT) algorithm (employing gradient descent with TT-singular value decomposition (TT-SVD)) and the factorization approach using the Riemannian gradient descent (RGD) algorithm. When RIP is satisfied, spectral initialization facilitates proper initialization, and we establish the linear convergence rate of both IHT and RGD.