TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inadequate calibration of low-frequency words—such as proper nouns and numerals—in traditional post-training quantization for automatic speech recognition (ASR), which leads to high rare-word error rates. The authors propose a label-free post-training quantization framework that rebalances the calibration data distribution via a closed-form, layer-wise reweighting scheme termed rareBAL, thereby equalizing calibration quality between frequent and rare words. Additionally, they introduce a metric-consistent residual correction mechanism that operates without entity labels or additional training. By integrating data-aware quantization, closed-form layer weight adjustment, and residual correction, the method achieves efficient deployment under W4G128 settings, significantly reducing rare-word error rates across eight ASR backbones and six datasets, exhibiting minimal cross-corpus variance, and generalizing effectively to entity-dense benchmarks in fully unsupervised conditions.
📝 Abstract
Data-aware post-training quantization (PTQ) minimizes a per-token reconstruction loss on a small calibration corpus, implicitly weighting positions by their empirical frequency. For \textbf{A}utomatic \textbf{S}peech \textbf{R}ecognition (ASR), this misaligns with tail-sensitive risk: names, numerals, and domain-specific words receive proportionally little calibration mass. We propose \textbf{Tail-Aware Reconstruction Quantization} (\TARQ), a label-free PTQ framework that shifts calibration toward the lexical tail via \textbf{\rareBAL}, a closed-form per-Linear-layer rule equalizing common/tail mass, paired with a metric-consistent residual correction. \TARQ\ requires no entity labels, no curated calibration set, no validation decoding, and no additional training. Across eight ASR backbones and six datasets at W4G128, \TARQ\ improves mean rare-\textbf{W}ord \textbf{E}rror \textbf{R}ate (rare-WER) without an aggregate-WER regression, achieves the lowest cross-corpus rare-WER swing among compared methods, and transfers to entity-rich benchmarks (ProfASR, ContextASR-Speech-En) without entity supervision.
Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition
Post-Training Quantization
Rare Words
Tail Sensitivity
Word Error Rate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tail-Aware Quantization
Post-Training Quantization
Rare-Word Robustness
Automatic Speech Recognition
Data-Efficient Calibration
🔎 Similar Papers
No similar papers found.