SoftDTW-CUDA-Torch: Memory-Efficient GPU-Accelerated Soft Dynamic Time Warping for PyTorch

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses key limitations in existing GPU implementations of SoftDTW, which are constrained by maximum sequence lengths (≤1024), numerical instability in backpropagation under small smoothing parameters, and excessive memory consumption due to the explicit construction of pairwise distance tensors. To overcome these issues, the authors introduce an efficient, open-source PyTorch library that supports sequences of arbitrary length through a block-wise anti-diagonal CUDA kernel, enhances numerical stability via log-space dynamic programming, and fuses distance computation with alignment operations to eliminate intermediate tensor allocations. Fully integrated with PyTorch’s autograd system and supporting SoftDTW barycenter computation, the proposed method achieves up to 98% reduction in GPU memory usage while maintaining high accuracy, substantially outperforming current implementations.

Technology Category

Application Category

📝 Abstract

We present softdtw-cuda-torch, an open-source PyTorch library for computing Soft Dynamic Time Warping (SoftDTW) on GPUs. Our implementation addresses three key limitations of existing GPU implementations of SoftDTW: a hard sequence-length cap of 1024, numerical instability in the backward pass for small smoothing parameters, and excessive GPU memory consumption from materializing pairwise distance tensors. We introduce (1) tiled anti-diagonal kernel execution that removes the sequence-length constraint, (2) a log-space back-ward pass that prevents floating-point overflow, and (3) a fused distance-computation mode that eliminates the O(BN M ) intermediate distance tensor, achieving up to 98% memory reduction compared to prior work. The library supports arbitrary sequence lengths, full PyTorch autograd integration, and Soft-DTW Barycenter computation. Code is available at https://github.com/BGU-CS-VIL/sdtw-cuda-torch.

Problem

Research questions and friction points this paper is trying to address.

SoftDTW

GPU memory efficiency

sequence length limitation

numerical instability

PyTorch

Innovation

Methods, ideas, or system contributions that make the work stand out.

SoftDTW

GPU acceleration

memory efficiency