Compression Scaling Laws:Unifying Sparsity and Quantization

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study investigates how weight pruning, weight quantization, and activation quantization affect the pretraining scaling laws of large language models (LLMs), aiming to establish a unified effective-parameter scaling framework. It is the first to incorporate quantization into such a framework, theoretically modeling and empirically validating parameter efficiency across varying bit-widths and sparsity levels. Results show that weight quantization substantially improves parameter efficiency, though full quantization exhibits diminishing returns at ultra-low bit-widths. Through systematic ablation and cross-configuration modeling, we derive a unified scaling formula that predicts performance under diverse compression strategies. Key contributions are: (1) demonstrating that disparate compression techniques share a common effective-parameter scaling mechanism; (2) unifying quantization and pruning within a single theoretical framework; and (3) providing a composable, predictive foundation and optimization paradigm for efficient LLM design.

Technology Category

Application Category

📝 Abstract

We investigate how different compression techniques -- such as weight and activation quantization, and weight sparsity -- affect the scaling behavior of large language models (LLMs) during pretraining. Building on previous work showing that weight sparsity acts as a constant multiplier on model size in scaling laws, we demonstrate that this"effective parameter"scaling pattern extends to quantization as well. Specifically, we establish that weight-only quantization achieves strong parameter efficiency multipliers, while full quantization of both weights and activations shows diminishing returns at lower bitwidths. Our results suggest that different compression techniques can be unified under a common scaling law framework, enabling principled comparison and combination of these methods.

Problem

Research questions and friction points this paper is trying to address.

Compression techniques impact LLM scaling

Unified scaling laws for sparsity and quantization

Parameter efficiency varies by quantization type

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies sparsity and quantization scaling

Establishes effective parameter efficiency multipliers

Enables principled compression method comparison

🔎 Similar Papers

Effective Interplay between Sparsity and Quantization: From Theory to Practice