Batched DGEMMs for scientific codes running on long vector architectures

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor performance of batched double-precision general matrix multiplication (DGEMM) in scientific computing—exemplified by the seismic wave simulator SeisSol—on long-vector architectures such as RISC-V, and the lack of adequate support from existing GEMM libraries, this work designs and implements the first lightweight, pure-C batched DGEMM library tailored for long-vector processors. Our approach integrates batch-level optimization, fine-grained vectorization, memory-layout reorganization, and a cross-architecture abstraction layer to balance high portability with competitive performance. On RISC-V platforms, the library achieves 3.5×–32.6× speedup over baseline implementations. When integrated into SeisSol, it significantly improves end-to-end application performance. Furthermore, cross-platform evaluation on Intel CPUs confirms its effectiveness, demonstrating substantial reductions in execution time across most scenarios.

Technology Category

Application Category

📝 Abstract
In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit. We focus on GEMM libraries and address their limited ability to leverage long vector architectures by developing a batched DGEMM library in plain C. This library achieves speedups ranging from approximately 3.5x to 32.6x compared to the reference implementation. We then integrate the batched approach into the SeisSol application, ensuring portability across different CPU architectures. Lastly, we demonstrate that our implementation is portable to an Intel CPU, resulting in improved execution times in most cases.
Problem

Research questions and friction points this paper is trying to address.

RISC-V architecture
GEMM performance
vector processor
Innovation

Methods, ideas, or system contributions that make the work stand out.

Batched DGEMM
Performance Optimization
Vector Processor Compatibility
🔎 Similar Papers
No similar papers found.