S2D: Selective Spectral Decay for Quantization-Friendly Conditioning of Neural Activations

šŸ“… 2026-02-16
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work addresses the significant accuracy degradation caused by outlier activations during quantization of large-scale pretrained models. It reveals, for the first time, a direct connection between activation outliers and the largest singular values of weight matrices. Building on this insight, the authors propose a geometry-inspired Selective Spectral Decay (S²D) strategy that regularizes dominant singular directions during fine-tuning to reshape activation distributions and enhance quantization robustness. The method requires no architectural modifications and achieves up to a 7% accuracy gain under W4A4 post-training quantization (PTQ) on ImageNet; when combined with quantization-aware training (QAT), it yields an additional 4% improvement. Furthermore, S²D demonstrates strong generalization across downstream tasks and vision-language models.

Technology Category

Application Category

šŸ“ Abstract
Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical correlation studies, we establish the direct link between these activation outliers and dominant singular values of the weights. Building on this insight, we propose Selective Spectral Decay ($S^2D$), a geometrically-principled conditioning method that surgically regularizes only the weight components corresponding to the largest singular values during fine-tuning. Through extensive experiments, we demonstrate that $S^2D$ significantly reduces activation outliers and produces well-conditioned representations that are inherently quantization-friendly. Models trained with $S^2D$ achieve up to 7% improved PTQ accuracy on ImageNet under W4A4 quantization and 4% gains when combined with QAT. These improvements also generalize across downstream tasks and vision-language models, enabling the scaling of increasingly large and rigorously trained models without sacrificing deployment efficiency.
Problem

Research questions and friction points this paper is trying to address.

activation outliers
model quantization
quantization-friendly
transformer models
singular values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Spectral Decay
activation outliers
quantization-friendly
singular values
model quantization
šŸ”Ž Similar Papers
No similar papers found.