SparkAttention: high-performance multi-head attention for large models on Volta GPU architecture

📅 2025-02-12
🏛️ CCF Transactions on High Performance Computing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low computational efficiency and high memory pressure in Multi-Head Attention (MHA) during large Transformer model training on NVIDIA Volta architecture (e.g., V100), this work proposes a fine-grained kernel fusion and dynamic shared-memory scheduling strategy tailored to Volta Tensor Cores. Our method integrates low-level CUDA optimizations, FP16/INT8 mixed-precision arithmetic, attention computation graph rewriting, and TCU-customized scheduling—achieving performance gains without accuracy degradation. Experimental results demonstrate a 3.2× improvement in MHA throughput and a 67% reduction in end-to-end inference latency. Notably, we achieve real-time inference for a 13B-parameter model on a single V100 GPU—the first such result on this hardware—while significantly enhancing GPU memory utilization efficiency and training scalability.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Accelerate Multi-Head Attention training
Optimize for Volta GPU architecture
Reduce memory access and overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes Multi-Head Attention on Volta GPU
Utilizes Tensor Core Units efficiently
Reduces memory access with kernel fusion
🔎 Similar Papers
No similar papers found.
Y
Youxuan Xu
School of Computer Science, Beijing University of Posts and Telecommunications
T
Tong Wu
School of Computer Science, Beijing University of Posts and Telecommunications
Shigang Li
Shigang Li
Professor, ParCIS Lab, Beijing University of Posts and Telecommunications
High Performance ComputingDeep Learning SystemsParallel ComputingComputer Architecture
X
Xueying Wang
School of Computer Science, Beijing University of Posts and Telecommunications
Jingjing Wang
Jingjing Wang
Professor, School of Cyber Science and Technology, Beihang University
AI for WirelessUAV NetworksSpace-Air-Ground-Sea NetworksCommunication Security