🤖 AI Summary
Traditional tensor models support only discrete integer indices, limiting their applicability to continuous-domain problems in computational geometry, computer graphics, and related fields. This paper introduces **continuous tensor algebra**, the first formal extension of tensor indexing to the real number domain (e.g., (A[3.14])). We propose a **piecewise-constant tensor representation format** and a **domain-specific automatic kernel generation compilation framework** tailored for continuous spaces, enabling efficient, unified tensor-based modeling of continuous computations. Evaluated on 2D radius search, genomic interval overlap queries, and NeRF trilinear interpolation, our approach achieves 9.20×, 1.22×, and 1.69× speedups, respectively, while reducing code size by 6–60× and matching or exceeding the performance of hand-optimized libraries. Our core contributions are: (i) establishing the theoretical foundations of continuous tensor algebra, and (ii) delivering an end-to-end, compilable implementation that bridges continuous mathematics with high-performance tensor computation.
📝 Abstract
This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]). It also presents continuous tensor algebra expressions, such as
C
x
,
y
=
A
x
,
y
∗
B
x
,
y
, where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, on which infinite domains can be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance to hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20× on 2D radius search with ∼60× fewer lines of code (LoC), 1.22× on genomic interval overlapping queries (with ∼18× LoC saving), and 1.69× on trilinear interpolation in Neural Radiance Field (with ∼6× LoC saving).