CoDeQ: End-to-End Joint Model Compression with Dead-Zone Quantizer for High-Sparsity and Low-Precision Networks

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing joint pruning and quantization methods rely on external auxiliary procedures to determine compression parameters, resulting in engineering complexity, hyperparameter sensitivity, and the absence of end-to-end gradient signals—leading to suboptimal solutions. This paper proposes Learnable Dead-Zone Quantization (LDQ), a novel framework that for the first time models magnitude-based pruning as a differentiable dead-zone width of a scalar quantizer, thereby unifying pruning and quantization into a single, end-to-end trainable optimization process. LDQ decouples sparsity and bit-width control, requiring only one global hyperparameter, and natively supports both fixed- and mixed-precision quantization without additional pipelines. On ImageNet, ResNet-18 compressed via LDQ achieves ~5% of the full-precision BOPs with negligible accuracy degradation, while simultaneously attaining high sparsity (>90%) and low-bit precision (2–4 bits) in a jointly optimized manner.

Technology Category

Application Category

📝 Abstract
While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering complexity and hyperparameter tuning, while also lacking a direct data-driven gradient signal, which might result in sub-optimal compression. In this paper, we introduce CoDeQ, a simple, fully differentiable method for joint pruning--quantization. Our approach builds on a key observation: the dead-zone of a scalar quantizer is equivalent to magnitude pruning, and can be used to induce sparsity directly within the quantization operator. Concretely, we parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters. This design provides explicit control of sparsity, regularized by a single global hyperparameter, while decoupling sparsity selection from bit-width selection. The result is a method for Compression with Dead-zone Quantizer (CoDeQ) that supports both fixed-precision and mixed-precision quantization (controlled by an optional second hyperparameter). It simultaneously determines the sparsity pattern and quantization parameters in a single end-to-end optimization. Consequently, CoDeQ does not require any auxiliary procedures, making the method architecture-agnostic and straightforward to implement. On ImageNet with ResNet-18, CoDeQ reduces bit operations to ~5% while maintaining close to full precision accuracy in both fixed and mixed-precision regimes.
Problem

Research questions and friction points this paper is trying to address.

Develops joint pruning-quantization method for neural networks
Eliminates external procedures for compression parameter tuning
Enables end-to-end optimization of sparsity and quantization parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint pruning-quantization via differentiable dead-zone quantizer
Learns dead-zone width and quantization parameters via backpropagation
Single end-to-end optimization for sparsity and quantization parameters
🔎 Similar Papers
No similar papers found.
J
Jonathan Wenshøj
Department of Computer Science, University of Copenhagen
T
Tong Chen
Department of Computer Science, University of Copenhagen
B
Bob Pepin
Department of Computer Science, University of Copenhagen
Raghavendra Selvan
Raghavendra Selvan
Assistant Professor (TT), University of Copenhagen
Sustainable AIEfficient Machine LearningMedical Image AnalysisAI for Sciences