Punching Above Precision: Small Quantized Model Distillation with Learnable Regularizer

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In low-bit quantization-aware training (QAT) combined with knowledge distillation (KD), optimization conflicts arise due to heterogeneous gradient magnitudes between task loss and distillation loss. Method: We propose the Game of Regularizer (GoR), a lightweight dynamic regularization mechanism with only two learnable parameters, which adaptively balances supervision signals to mitigate gradient conflict. Building upon GoR, we introduce QAT-EKD-GoR—a unified framework supporting ensemble KD from multiple teachers. Contribution/Results: QAT-EKD-GoR achieves state-of-the-art performance across image classification, object detection, and large language model compression. In several cases, it even surpasses full-precision baselines in accuracy while significantly improving inference efficiency on edge devices—effectively reconciling high accuracy with ultra-low power consumption.

Technology Category

Application Category

📝 Abstract
Quantization-aware training (QAT) combined with knowledge distillation (KD) is a promising strategy for compressing Artificial Intelligence (AI) models for deployment on resource-constrained hardware. However, existing QAT-KD methods often struggle to balance task-specific (TS) and distillation losses due to heterogeneous gradient magnitudes, especially under low-bit quantization. We propose Game of Regularizer (GoR), a novel learnable regularization method that adaptively balances TS and KD objectives using only two trainable parameters for dynamic loss weighting. GoR reduces conflict between supervision signals, improves convergence, and boosts the performance of small quantized models (SQMs). Experiments on image classification, object detection (OD), and large language model (LLM) compression show that GoR consistently outperforms state-of-the-art QAT-KD methods. On low-power edge devices, it delivers faster inference while maintaining full-precision accuracy. We also introduce QAT-EKD-GoR, an ensemble distillation framework that uses multiple heterogeneous teacher models. Under optimal conditions, the proposed EKD-GoR can outperform full-precision models, providing a robust solution for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Balancing task-specific and distillation losses in quantized model training
Improving performance of small quantized models under low-bit constraints
Enabling efficient AI deployment on resource-constrained edge devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable regularization adaptively balances distillation objectives
Two trainable parameters enable dynamic loss weighting
Ensemble distillation uses multiple heterogeneous teacher models
🔎 Similar Papers
No similar papers found.
Abdur Rehman
Abdur Rehman
University Of Engineering & Technology Lahore
Linear AlgebraMatrix theory
S M A Sharif
S M A Sharif
Research Team Lead, Opt-AI Inc. (LG Sciencepark)
Computer VisionComputational PhotographyDeep LearningModel Compression
M
Md Abdur Rahaman
Opt-AI, Seoul, South Korea
M
Mohamed Jismy Aashik Rasool
Opt-AI, Seoul, South Korea
S
Seongwan Kim
Opt-AI, Seoul, South Korea
J
Jaeho Lee
Opt-AI, Seoul, South Korea