On the Importance of a Multi-Scale Calibration for Quantization

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of conventional post-training quantization (PTQ), which relies on fixed-length calibration sequences and fails to accurately capture weight importance across varying input lengths. To overcome this, we propose MaCa (Matryoshka Calibration), a novel approach that systematically reveals the critical role of multi-scale calibration in quantization performance. MaCa integrates input length information from multiple scales into Hessian-based weight importance estimation and treats each sequence as an independent sample for regularization, thereby establishing a length-aware, quantization-sensitive calibration mechanism. Notably, MaCa requires no architectural modifications and seamlessly integrates into existing PTQ pipelines. Extensive experiments demonstrate its effectiveness in significantly improving low-bit quantization accuracy across prominent large language models, including Qwen3, Gemma3, and LLaMA3.

Technology Category

Application Category

📝 Abstract
Post-training quantization (PTQ) is a cornerstone for efficiently deploying large language models (LLMs), where a small calibration set critically affects quantization performance. However, conventional practices rely on random sequences of fixed length, overlooking the variable-length nature of LLM inputs. Input length directly influences the activation distribution and, consequently, the weight importance captured by the Hessian, which in turn affects quantization outcomes. As a result, Hessian estimates derived from fixed-length calibration may fail to represent the true importance of weights across diverse input scenarios. We propose MaCa (Matryoshka Calibration), a simple yet effective method for length-aware Hessian construction. MaCa (i) incorporates multi-scale sequence length information into Hessian estimation and (ii) regularizes each sequence as an independent sample, yielding a more stable and fruitful Hessian for accurate quantization. Experiments on state-of-the-art LLMs (e.g., Qwen3, Gemma3, LLaMA3) demonstrate that MaCa consistently improves accuracy under low bit quantization, offering a lightweight enhancement compatible with existing PTQ frameworks. To the best of our knowledge, this is the first work to systematically highlight the role of multi-scale calibration in LLM quantization.
Problem

Research questions and friction points this paper is trying to address.

post-training quantization
large language models
calibration
Hessian
input length
Innovation

Methods, ideas, or system contributions that make the work stand out.

post-training quantization
multi-scale calibration
Hessian estimation
length-aware quantization
large language models
🔎 Similar Papers
No similar papers found.
S
Seungwoo Son
Samsung Research, Seoul, South Korea
I
Ingyu Seong
Samsung Research, Seoul, South Korea
J
Junhan Kim
Samsung Research, Seoul, South Korea
Hyemi Jang
Hyemi Jang
Seoul National University
VisionSecurity&PrivacyContinual Learning
Y
Yongkweon Jeon
Samsung Research, Seoul, South Korea