π€ AI Summary
This work addresses the high computational cost of eigendecomposition for small-batch, low-dimensional (dimension < 64) matrices in deep neural networks. To overcome the limitation of existing QR-based algorithms, which are typically restricted to very small matrices (dimension < 32), the authors propose a GPU-optimized batched divide-and-conquer eigendecomposition algorithm tailored for parallel architectures. By leveraging an efficient divide-and-conquer strategy, the method significantly enhances throughput for batch processing while maintaining numerical stability. Experimental results demonstrate that, for matrices with dimensions below 64, the proposed approach achieves substantially faster computation than PyTorchβs built-in SVD implementation without compromising accuracy.
π Abstract
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in deep neural networks. Our previous work proposed a dedicated QR-based ED algorithm for batched small matrices (dim${<}32$). This short paper targets the limitation and proposes a batch-efficient Divide-and-Conquer based ED algorithm for larger matrices. The numerical test shows that for a mini-batch of matrices whose dimensions are smaller than $64$, our method can be much faster than the Pytorch SVD function.