Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the challenge of deploying Conformer-based automatic speech recognition (ASR) models on resource-constrained edge devices, this paper proposes an ultra-low-bit (1-/2-bit) weight quantization method. To mitigate severe accuracy degradation inherent in extreme quantization, we introduce a multi-precision collaborative training framework, a stochastic precision quantization mechanism, tensor-wise learnable scaling factors, and a low-bit gradient approximation strategy; additionally, we employ joint knowledge distillation leveraging both Switchboard and LibriSpeech datasets. On standard benchmarks, our approach achieves *lossless* 1-bit and 2-bit quantization—i.e., zero word error rate (WER) increase—achieving compression ratios of 16.6× and 16.2×, respectively, significantly surpassing prior art. To the best of our knowledge, this is the first work to achieve strictly lossless ultra-low-bit quantization on mainstream ASR architectures.

Technology Category

Application Category

📝 Abstract

Model compression has become an emerging need as the sizes of modern speech systems rapidly increase. In this paper, we study model weight quantization, which directly reduces the memory footprint to accommodate computationally resource-constrained applications. We propose novel approaches to perform extremely low-bit (i.e., 2-bit and 1-bit) quantization of Conformer automatic speech recognition systems using multiple precision model co-training, stochastic precision, and tensor-wise learnable scaling factors to alleviate quantization incurred performance loss. The proposed methods can achieve performance-lossless 2-bit and 1-bit quantization of Conformer ASR systems trained with the 300-hr Switchboard and 960-hr LibriSpeech corpus. Maximum overall performance-lossless compression ratios of 16.2 and 16.6 times are achieved without a statistically significant increase in the word error rate (WER) over the full precision baseline systems, respectively.

Problem

Research questions and friction points this paper is trying to address.

Extremely low-bit quantization for Conformer ASR systems

Reducing memory footprint for resource-constrained applications

Achieving performance-lossless compression with minimal WER increase

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extremely low-bit Conformer quantization

Co-training with multiple precision

Stochastic precision and learnable scaling

🔎 Similar Papers

No similar papers found.