🤖 AI Summary
This study addresses the severe performance degradation commonly observed in post-training quantization of large language models to ultra-low bit-widths (e.g., 2-bit), a phenomenon whose underlying mechanisms remain poorly understood. The work systematically identifies two fundamental failure modes—signal degradation and computational collapse—and introduces the first diagnostic framework capable of distinguishing between them. Building on this mechanistic analysis, the authors propose a training-free, targeted repair strategy informed by signal propagation tracing and component-wise functional assessment. Experiments demonstrate that signal degradation can be effectively mitigated through precise interventions, whereas computational collapse necessitates architectural restructuring, rendering existing compensation methods ineffective. This research provides both theoretical insights and practical pathways for advancing ultra-low-bit quantization of large language models.
📝 Abstract
Post-Training Quantization (PTQ) is critical for the efficient deployment of Large Language Models (LLMs). While 4-bit quantization is widely regarded as an optimal trade-off, reducing the precision to 2-bit usually triggers a catastrophic ``performance cliff.'' It remains unclear whether the underlying mechanisms differ fundamentally. Consequently, we conduct a systematic mechanistic analysis, revealing two qualitatively distinct failure modes: Signal Degradation, where the computational patterns remain intact but information precision is impaired by cumulative error; and Computation Collapse, where key components fail to function, preventing correct information processing and destroying the signal in the early layers. Guided by this diagnosis, we conduct mechanism-aware interventions, demonstrating that targeted, training-free repair can mitigate Signal Degradation, but remains ineffective for Computation Collapse. Our findings provide a systematic diagnostic framework for PTQ failures and suggest that addressing Computation Collapse requires structural reconstruction rather than mere compensation.