The Super Weight in Large Language Models

📅 2024-11-11
🏛️ arXiv.org
📈 Citations: 13
Influential: 2
📄 PDF
🤖 AI Summary
LLMs contain a sparse set of “super weights” whose perturbation—by altering a single parameter—triggers catastrophic performance collapse (e.g., sharp perplexity increase or near-zero zero-shot accuracy). This work introduces the first data-free, single-forward-pass method to identify super weights, grounded in forward sensitivity analysis and joint detection of weight/activation anomalies; it formally defines both super weights and their associated “super activations.” We further propose a non-uniform quantization strategy: preserving critical parameters at high precision while aggressively pruning and round-to-nearest quantizing the rest. This scheme is extended to block-wise quantization for larger weight groups. Experiments demonstrate that our approach matches or exceeds state-of-the-art quantization methods in accuracy, significantly improves robustness and inference efficiency, and achieves up to 4× faster quantization. We publicly release super-weight coordinate indices for major open-source LLMs.

Technology Category

Application Category

📝 Abstract
Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: Pruning as few as a single parameter can destroy an LLM's ability to generate text -- increasing perplexity by 3 orders of magnitude and reducing zero-shot accuracy to guessing. We propose a data-free method for identifying such parameters, termed super weights, using a single forward pass through the model. We additionally find that these super weights induce correspondingly rare and large activation outliers, termed super activations. When preserved with high precision, super activations can improve simple round-to-nearest quantization to become competitive with state-of-the-art methods. For weight quantization, we similarly find that by preserving the super weight and clipping other weight outliers, round-to-nearest quantization can scale to much larger block sizes than previously considered. To facilitate further research into super weights, we provide an index of super weight coordinates for common, openly available LLMs.
Problem

Research questions and friction points this paper is trying to address.

Identify critical parameters (super weights) in LLMs
Preserve super activations to improve quantization methods
Enable efficient pruning without destroying model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-free method identifies super weights
Preserve super activations for better quantization
Clip weight outliers to enable larger blocks
🔎 Similar Papers
No similar papers found.