Beyond Outliers: A Data-Free Layer-wise Mixed-Precision Quantization Approach Driven by Numerical and Structural Dual-Sensitivity

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limitations of existing layer-wise mixed-precision quantization methods, which rely solely on a single numerical metric to assess sensitivity and overlook the diverse computational roles and structural differences among weight modules within each layer. To overcome this, the authors propose the NSDS framework, which introduces—for the first time—a dual sensitivity mechanism that jointly considers numerical and structural aspects. By decomposing the computational roles of each layer and integrating sensitivity scores from both dimensions, NSDS enables fine-grained bit allocation without requiring calibration data. The method employs a robust layer sensitivity aggregation strategy based on MAD-Sigmoid and Soft-OR operators, consistently outperforming current baselines across various models and downstream tasks, while maintaining strong performance even under extremely low-bit compression regimes.

Technology Category

Application Category

📝 Abstract

Layer-wise mixed-precision quantization (LMPQ) enables effective compression under extreme low-bit settings by allocating higher precision to sensitive layers. However, existing methods typically treat all intra-layer weight modules uniformly and rely on a single numerical property when estimating sensitivity, overlooking their distinct operational roles and structural characteristics. To address this, we propose NSDS, a novel calibration-free LMPQ framework driven by Numerical and Structural Dual-Sensitivity. Specifically, it first mechanistically decomposes each layer into distinct operational roles and quantifies their sensitivity from both numerical and structural perspectives. These dual-aspect scores are then aggregated into a unified layer-wise metric through a robust aggregation scheme based on MAD-Sigmoid and Soft-OR to guide bit allocation. Extensive experiments demonstrate that NSDS consistently achieves superior performance compared to various baselines across diverse models and downstream tasks, without relying on any calibration data.

Problem

Research questions and friction points this paper is trying to address.

layer-wise mixed-precision quantization

sensitivity estimation

numerical sensitivity

structural sensitivity

weight modules

Innovation

Methods, ideas, or system contributions that make the work stand out.

mixed-precision quantization

data-free

numerical sensitivity