Float8@2bits: Entropy Coding Enables Data-Free Model Compression

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of post-training model compression at extremely low bitrates (<4 bits), where existing data-free methods often suffer from catastrophic performance degradation, while calibration-based approaches incur high computational costs and are sensitive to distribution shifts. The authors propose EntQuant, a novel framework that uniquely integrates entropy coding with floating-point quantization (e.g., Float8@2bit), decoupling numerical precision from storage cost to enable efficient, calibration-free compression. EntQuant achieves both the universality of data-free methods and the high fidelity typically associated with data-dependent techniques. It compresses 70B-parameter models within 30 minutes while attaining state-of-the-art accuracy and preserving strong functional capabilities on complex instruction-tuned models, all with manageable inference overhead.

Technology Category

Application Category

📝 Abstract

Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness under data distribution shifts. We introduce EntQuant, the first framework to unite the advantages of these distinct paradigms. By matching the performance of data-dependent methods with the speed and universality of data-free techniques, EntQuant enables practical utility in the extreme compression regime. Our method decouples numerical precision from storage cost via entropy coding, compressing a 70B parameter model in less than 30 minutes. We demonstrate that EntQuant does not only achieve state-of-the-art results on standard evaluation sets and models, but also retains functional performance on more complex benchmarks with instruction-tuned models, all at modest inference overhead.

Problem

Research questions and friction points this paper is trying to address.

model compression

post-training quantization

data-free compression

extreme bit-rate

functional collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy coding

data-free compression

extreme quantization