🤖 AI Summary
This work addresses the issue of error accumulation in layer-wise quantization of encoder-decoder automatic speech recognition (ASR) models, which often leads to performance instability and degraded accuracy. To mitigate this, the authors propose Fine-grained Adaptive Dynamic Error propagation (FADE), a novel post-training quantization method that introduces a runtime-adaptive, cross-layer error correction mechanism tailored to the heterogeneous characteristics of ASR encoders and decoders. FADE dynamically balances local quantization and error compensation to optimize quantization fidelity. Experimental results demonstrate that FADE significantly reduces the variance of word error rate (WER) across multiple runs and achieves consistently lower average WER compared to existing baseline methods.
📝 Abstract
Running Automatic Speech Recognition (ASR) models on memory-constrained edge devices requires efficient compression. While layer-wise post-training quantization is effective, it suffers from error accumulation, especially in encoder-decoder architectures. Existing solutions like Quantization Error Propagation (QEP) are suboptimal for ASR due to the model's heterogeneity, processing acoustic features in the encoder while generating text in the decoder. To address this, we propose Fine-grained Alpha for Dynamic Quantization Error Propagation (FADE), which adaptively controls the trade-off between cross-layer error correction and local quantization. Experiments show that FADE significantly improves stability by reducing performance variance across runs, while simultaneously surpassing baselines in mean WER.