Universal-2-TF: Robust All-Neural Text Formatting for ASR

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper addresses critical post-processing challenges in commercial ASR outputs—namely, missing punctuation, inconsistent capitalization, and unnormalized numerals/abbreviations—by proposing the first end-to-end multi-objective text normalization framework. Methodologically, it departs from rule-based and hybrid approaches, introducing a lightweight two-stage fully neural architecture: (1) a multi-task token classifier jointly predicting punctuation, capitalization, and inverse text normalization labels; and (2) a sequence-to-sequence model for fine-grained correction. Both stages are jointly trained and seamlessly integrated into the Universal-2 ASR system. Experiments demonstrate significant improvements over strong baselines across objective metrics (e.g., +5.1 F1 points, −40% inference latency) and subjective listening evaluations. The framework further exhibits superior cross-domain generalization and enhanced hallucination suppression.

Technology Category

Application Category

📝 Abstract

This paper introduces an all-neural text formatting (TF) model designed for commercial automatic speech recognition (ASR) systems, encompassing punctuation restoration (PR), truecasing, and inverse text normalization (ITN). Unlike traditional rule-based or hybrid approaches, this method leverages a two-stage neural architecture comprising a multi-objective token classifier and a sequence-to-sequence (seq2seq) model. This design minimizes computational costs and reduces hallucinations while ensuring flexibility and robustness across diverse linguistic entities and text domains. Developed as part of the Universal-2 ASR system, the proposed method demonstrates superior performance in TF accuracy, computational efficiency, and perceptual quality, as validated through comprehensive evaluations using both objective and subjective methods. This work underscores the importance of holistic TF models in enhancing ASR usability in practical settings.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

Text Formatting

Practical Utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal-2-TF

Neural Text Formatting

Automatic Speech Recognition

🔎 Similar Papers

Handling Numeric Expressions in Automatic Speech Recognition