QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing performance and efficiency in speech-language understanding (SLU) models under resource-constrained settings, this paper proposes the first multi-stage training framework that jointly optimizes knowledge distillation and neural network quantization. Departing from conventional two-stage, decoupled paradigms, our approach deeply integrates distillation with fine-grained quantization—including 1–2-bit asymmetric and layer-wise calibrated quantization—and introduces an SLU-specific loss function to enhance robustness and generalization under extreme low-bit regimes. Evaluated on SLURP and FSC benchmarks, our method achieves accuracies of 71.13% and 99.20%, respectively, while reducing computational cost by 60–73× and model size by 83–700×, with accuracy degradation bounded at ≤5.56%. This work unifies high-fidelity inference with aggressive model compression for edge-deployable SLU systems.

Technology Category

Application Category

📝 Abstract
Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13% accuracy on SLURP and 99.20% on FSC, with only minor degradations of up to 5.56% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73$ imes$ (GMACs) and model size by 83--700$ imes$, demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and efficiency in SLU systems
Unifying distillation and quantization for optimal compression
Maintaining accuracy under extreme resource constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified distillation and quantization framework
Multi-stage training with pre-tuned model
Maintains accuracy in low-bit regimes
🔎 Similar Papers
No similar papers found.