🤖 AI Summary
This work addresses the significant quality degradation of the Wan2.2 text-to-video generation model under W4A4 low-bit quantization by proposing an efficient post-training quantization (PTQ) method based on the ViDiT-Q framework. The approach employs HiFloat4-format pseudo-quantization for linear layers in the backbone Transformer while preserving high precision in sensitivity-prone boundary modules. It further introduces a novel activation tail-aware percentile calibration mechanism to construct channel-wise masks, effectively suppressing outlier interference. Coupled with a compact PTQ state recovery strategy, the method substantially enhances quantization robustness without altering the original HiFloat4 inference pipeline. Experimental results demonstrate that the proposed technique achieves high-fidelity video generation under W4A4 settings, markedly reducing quantization error and improving generation stability.
📝 Abstract
This report describes Tail-Aware HiFloat4, our submission to the low-bit text-to-video generation quantization challenge. Our method adapts the public ViDiT-Q post-training quantization pipeline to Wan2.2 under the HiFloat4 numerical format. We quantize the main linear layers in both Wan2.2 transformer modules with W4A4 HiFloat4 fake quantization, keep numerically sensitive boundary modules in high precision, and introduce an activation-tail-aware percentile calibration module for channel-mask construction. Together with compact PTQ-state restoration, this design reduces the influence of rare calibration outliers while keeping the runtime HiFloat4 arithmetic and sampling pipeline unchanged.