TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

135K/year

🤖 AI Summary

This work addresses the inadequate calibration of low-frequency words—such as proper nouns and numerals—in traditional post-training quantization for automatic speech recognition (ASR), which leads to high rare-word error rates. The authors propose a label-free post-training quantization framework that rebalances the calibration data distribution via a closed-form, layer-wise reweighting scheme termed rareBAL, thereby equalizing calibration quality between frequent and rare words. Additionally, they introduce a metric-consistent residual correction mechanism that operates without entity labels or additional training. By integrating data-aware quantization, closed-form layer weight adjustment, and residual correction, the method achieves efficient deployment under W4G128 settings, significantly reducing rare-word error rates across eight ASR backbones and six datasets, exhibiting minimal cross-corpus variance, and generalizing effectively to entity-dense benchmarks in fully unsupervised conditions.

📝 Abstract

Data-aware post-training quantization (PTQ) minimizes a per-token reconstruction loss on a small calibration corpus, implicitly weighting positions by their empirical frequency. For \textbf{A}utomatic \textbf{S}peech \textbf{R}ecognition (ASR), this misaligns with tail-sensitive risk: names, numerals, and domain-specific words receive proportionally little calibration mass. We propose \textbf{Tail-Aware Reconstruction Quantization} (\TARQ), a label-free PTQ framework that shifts calibration toward the lexical tail via \textbf{\rareBAL}, a closed-form per-Linear-layer rule equalizing common/tail mass, paired with a metric-consistent residual correction. \TARQ\ requires no entity labels, no curated calibration set, no validation decoding, and no additional training. Across eight ASR backbones and six datasets at W4G128, \TARQ\ improves mean rare-\textbf{W}ord \textbf{E}rror \textbf{R}ate (rare-WER) without an aggregate-WER regression, achieves the lowest cross-corpus rare-WER swing among compared methods, and transfers to entity-rich benchmarks (ProfASR, ContextASR-Speech-En) without entity supervision.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

Post-Training Quantization

Rare Words

Tail Sensitivity

Word Error Rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tail-Aware Quantization

Post-Training Quantization

Rare-Word Robustness

Automatic Speech Recognition

Data-Efficient Calibration

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI