Improving Runtime Performance of Tensor Computations using Rust From Python

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Key computational kernels in the Python tensor library pyttb—such as accumulation, tensor multiplication, and low-rank decomposition—suffer from suboptimal runtime performance. Method: This work introduces, for the first time, a systematic integration of Rust to develop high-performance Python extension modules via the Python C API, leveraging Rust’s memory safety and zero-cost abstractions to optimize deeply nested loops and large-scale tensor operations. Contribution/Results: Empirical evaluation on synthetic datasets across multiple problem scales demonstrates consistent and substantial speedups over pure Python, Numba JIT-compiled code, and NumPy-based implementations—reaching up to several-fold acceleration. Performance gains are especially pronounced for loop-intensive kernels. The proposed Rust–Python co-design paradigm provides a reusable, safe, and efficient optimization framework for scientific computing libraries.

Technology Category

Application Category

📝 Abstract
In this work, we investigate improving the runtime performance of key computational kernels in the Python Tensor Toolbox (pyttb), a package for analyzing tensor data across a wide variety of applications. Recent runtime performance improvements have been demonstrated using Rust, a compiled language, from Python via extension modules leveraging the Python C API -- e.g., web applications, data parsing, data validation, etc. Using this same approach, we study the runtime performance of key tensor kernels of increasing complexity, from simple kernels involving sums of products over data accessed through single and nested loops to more advanced tensor multiplication kernels that are key in low-rank tensor decomposition and tensor regression algorithms. In numerical experiments involving synthetically generated tensor data of various sizes and these tensor kernels, we demonstrate consistent improvements in runtime performance when using Rust from Python over 1) using Python alone, 2) using Python and the Numba just-in-time Python compiler (for loop-based kernels), and 3) using the NumPy Python package for scientific computing (for pyttb kernels).
Problem

Research questions and friction points this paper is trying to address.

Improving runtime performance of Python tensor computations
Using Rust to accelerate key tensor kernels
Comparing Rust implementations with Python and Numba alternatives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Rust from Python for performance
Replacing Python loops with compiled code
Optimizing tensor kernels via extension modules
🔎 Similar Papers
No similar papers found.
K
Kimmie Harding
New Jersey Institute of Technology; Sandia National Laboratories
Daniel M. Dunlavy
Daniel M. Dunlavy
Sandia National Laboratories
TensorsMachine LearningHigh Performance ComputingNumerical Optimization