Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work proposes Sparse-BitNet, a novel framework that enables the first stable joint training of 1.58-bit BitNet quantization and dynamic N:M semi-structured sparsity in large language models. The study reveals that 1.58-bit ternary quantization inherently aligns with N:M sparsity patterns, motivating the design of a unified training architecture and custom sparse tensor kernels. By leveraging sparse pretraining and a dense-to-sparse transfer strategy, Sparse-BitNet achieves lower performance degradation at equivalent sparsity levels and tolerates higher sparsity ratios than prior approaches. The method yields up to a 1.30× speedup in both training and inference while preserving model accuracy.

Technology Category

Application Category

📝 Abstract

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more compatible with N:M sparsity than full-precision models. To study this effect, we propose Sparse-BitNet, a unified framework that jointly applies 1.58-bit quantization and dynamic N:M sparsification while ensuring stable training for the first time. Across multiple model scales and training regimes (sparse pretraining and dense-to-sparse schedules), 1.58-bit BitNet consistently exhibits smaller performance degradation than full-precision baselines at the same sparsity levels and can tolerate higher structured sparsity before accuracy collapse. Moreover, using our custom sparse tensor core, Sparse-BitNet achieves substantial speedups in both training and inference, reaching up to 1.30X. These results highlight that combining extremely low-bit quantization with semi-structured N:M sparsity is a promising direction for efficient LLMs. Code available at https://github.com/AAzdi/Sparse-BitNet

Problem

Research questions and friction points this paper is trying to address.

semi-structured sparsity

low-bit quantization

large language models

N:M sparsity

model efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

1.58-bit quantization

N:M sparsity

Sparse-BitNet