BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing post-training quantization methods suffer significant performance degradation under the MXFP4 format due to outlier propagation across blocks and bimodal activation distributions. To address these challenges, this work proposes BATQuant, which introduces, for the first time, a learnable block-level affine transformation aligned with the MXFP block structure. This approach effectively suppresses outlier diffusion while adhering to MXFP granularity constraints and reshapes activation distributions through non-orthogonal optimization. By integrating global private Kronecker decomposition with block-level learnable clipping, BATQuant achieves up to 96.43% recovery of full-precision performance in multimodal tasks under the W4A4KV16 configuration, substantially outperforming current quantization schemes.

Technology Category

Application Category

📝 Abstract
Microscaling floating-point (MXFP) formats have emerged as a promising standard for deploying Multi-modal Large Language Models (MLLMs) and Large Language Models (LLMs) on modern accelerator architectures. However, existing Post-Training Quantization (PTQ) methods, particularly rotation-based techniques designed for integer formats, suffer from severe performance collapse when applied to MXFP4. Recent studies attribute this failure to a fundamental format mismatch: global orthogonal rotations inadvertently transfer outlier energy across quantization blocks, inducing new outliers that disrupt local block-wise scaling, while often creating bimodal activation distributions that underutilize the limited quantization range. To address these issues, we propose BATQuant (Block-wise Affine Transformation), which restricts transformations to align with MXFP granularity to prevent cross-block outlier propagation, while relaxing orthogonality constraints to optimize distribution shaping. To ensure parameter efficiency, we introduce Global and Private Kronecker (GPK) decomposition to effectively reduces storage and runtime overhead and incorporate Block-wise Learnable Clipping to suppress residual outliers. Extensive experiments on both MLLMs and LLMs demonstrate that BATQuant establishes new state-of-the-art results under aggressive W4A4KV16 configurations, recovering up to 96.43% of full-precision performance on multimodal benchmarks and clearly outperforming existing methods across diverse tasks.
Problem

Research questions and friction points this paper is trying to address.

MXFP4
Post-Training Quantization
outlier propagation
block-wise scaling
activation distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

MXFP4
Block-wise Quantization
Outlier Resilience
Learnable Clipping
Kronecker Decomposition
🔎 Similar Papers
No similar papers found.