Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant accuracy degradation in post-training quantization of large language models caused by outliers in weights and activations. Existing approaches either inadequately suppress these outliers or introduce additional latency and computational overhead. Leveraging the observation that well-trained models converge to flat minima, the authors propose an activation-guided structured regularization framework that incorporates a hardware-friendly, zero-latency outlier suppression mechanism. This approach effectively mitigates weight outliers associated with high-magnitude activations without increasing inference cost or requiring complex operator fusion. The method is orthogonal and compatible with mainstream quantization schemes such as GPTQ, achieving higher accuracy than sophisticated learned rotation methods on LLaMA-2-7B while reducing quantization time by nearly two-thirds, thus balancing both precision and efficiency.

Technology Category

Application Category

📝 Abstract
Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Existing mitigation strategies often face critical limitations: they either yield insufficient outlier suppression or incur significant deployment inefficiencies, such as inference latency, heavy preprocessing, or reliance on complex operator fusion. To resolve these limitations, we leverage a key insight: over-parameterized LLMs often converge to Flat Minima, implying a vast equivalent solution space where weights can be adjusted without compromising accuracy. Building on this, we propose Astro, an Activation-guided Structured Regularization framework designed to suppress the negative effects of outliers in a hardware-friendly and efficient manner. Leveraging the activation-guided regularization objective, Astro actively reconstructs intrinsically robust weights, aggressively suppressing weight outliers corresponding to high-magnitude activations without sacrificing model accuracy. Crucially, Astro introduces zero inference latency and is orthogonal to mainstream quantization methods like GPTQ. Extensive experiments show that Astro achieves highly competitive performance; notably, on LLaMA-2-7B, it achieves better performance than complex learning-based rotation methods with almost 1/3 of the quantization time.
Problem

Research questions and friction points this paper is trying to address.

outlier
post-training quantization
Large Language Model
weight quantization
activation outliers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Activation-guided Regularization
Outlier-Robust Quantization
Flat Minima
Weight-only PTQ
Hardware-efficient LLM
🔎 Similar Papers
X
Xi Chen
Beijing Institute of Technology
M
Ming Li
Beijing Institute of Technology
J
Junxi Li
Beijing Institute of Technology
Changsheng Li
Changsheng Li
Beijing Institute of Technology
Flexible roboticsMechanical DesignRoboticsMedical RoboticsSurgical Robotics
Peisong Wang
Peisong Wang
CASIA
Deep Neural Network Acceleration and Compression
L
Lizhong Ding
Beijing Institute of Technology
Y
Ye Yuan
Beijing Institute of Technology
Guoren Wang
Guoren Wang
Beijing Institute of Technology