To 2:4 Sparsity and Beyond: Neuron-level Activation Function to Accelerate LLM Pre-Training

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the computational bottleneck in large language model pretraining, where matrix multiplications in feedforward networks (FFNs) account for over 50% of the total compute cost. To alleviate this, the authors propose Venom—a neuron-level activation sparsity scheme based on a generalized v:n:m structured sparsity pattern—extending hardware-supported 2:4 weight sparsity to activations for the first time. The method integrates a sparse-dense hybrid training pipeline and is compatible with NVIDIA A100 and newer GPUs. Venom is orthogonal to other optimization techniques such as quantization and preserves model performance on standard benchmarks while achieving 1.4–1.7× end-to-end pretraining speedup.

Technology Category

Application Category

📝 Abstract

Trainings of Large Language Models are generally bottlenecked by matrix multiplications. In the Transformer architecture, a large portion of these operations happens in the Feed Forward Network (FFN), and this portion increases for larger models, up to 50% of the total pretraining floating point operations. We show that we can leverage hardware-accelerated sparsity to accelerate all matrix multiplications in the FFN, with 2:4 sparsity for weights and v:n:m (Venom) sparsity for activations. Our recipe relies on sparse training steps to accelerate a large part of the pretraining, associated with regular dense training steps towards the end. Overall, models trained with this approach exhibit the same performance on our quality benchmarks, and can speed up training end-to-end by 1.4 to 1.7x. This approach is applicable to all NVIDIA GPUs starting with the A100 generation, and is orthogonal to common optimization techniques, such as, quantization, and can also be applied to mixture-of-experts model architectures.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Pre-training

Matrix Multiplication

Sparsity

Feed Forward Network

Innovation

Methods, ideas, or system contributions that make the work stand out.

2:4 sparsity

neuron-level activation sparsity

sparse training