🤖 AI Summary
To address accuracy degradation and efficiency bottlenecks in compressing speech foundation models (e.g., wav2vec 2.0, HuBERT), this paper proposes an end-to-end mixed-precision quantization method. Unlike conventional two-stage, decoupled approaches, our method unifies bit-width assignment learning and parameter quantization within a single-stage differentiable optimization framework. It jointly optimizes bit widths via gradient-driven learning and employs structural awareness to determine layer-wise precision configurations, enabling simultaneous precision allocation and quantization during training. Evaluated on HuBERT-large, the method achieves an 8.6× lossless compression ratio—1.9× higher than the baseline—while maintaining zero increase in word error rate (WER). Moreover, compression latency is reduced by 1.5–1.9×. These results demonstrate significant improvements in model lightweighting efficiency and deployment feasibility.
📝 Abstract
This paper presents a novel mixed-precision quantization approach for speech foundation models that tightly integrates mixed-precision learning and quantized model parameter estimation into one single model compression stage. Experiments conducted on LibriSpeech dataset with fine-tuned wav2vec2.0-base and HuBERT-large models suggest the resulting mixed-precision quantized models increased the lossless compression ratio by factors up to 1.7x and 1.9x over the respective uniform-precision and two-stage mixed-precision quantized baselines that perform precision learning and model parameters quantization in separate and disjointed stages, while incurring no statistically word error rate (WER) increase over the 32-bit full-precision models. The system compression time of wav2vec2.0-base and HuBERT-large models is reduced by up to 1.9 and 1.5 times over the two-stage mixed-precision baselines, while both produce lower WERs. The best-performing 3.5-bit mixed-precision quantized HuBERT-large model produces a lossless compression ratio of 8.6x over the 32-bit full-precision system.