Analyzing the Importance of Blank for CTC-Based Knowledge Distillation

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In knowledge distillation for CTC-based automatic speech recognition (ASR) models, performance degradation arises from improper modeling of the blank token and reliance on ground-truth transcripts and the CTC loss. Method: This paper systematically identifies the critical role of the blank token in distillation and proposes an alignment-aware blank selection mechanism. By introducing a symmetric blank selection strategy, it enables end-to-end audio-only distillation—eliminating dependence on both transcript annotations and the CTC loss. The approach integrates blank token reweighting into an unsupervised distillation framework. Contribution/Results: Evaluated on LibriSpeech and AISHELL benchmarks, the method achieves ASR accuracy comparable to supervised distillation while enabling purely text-free audio compression. It significantly improves inference speed and establishes a new paradigm for efficient, low-dependency ASR model compression.

Technology Category

Application Category

📝 Abstract
With the rise of large pre-trained foundation models for automatic speech recognition new challenges appear. While the performance of these models is good, runtime and cost of inference increases. One approach to make use of their strength while retaining efficiency is to distill their knowledge to smaller models during training. In this work, we explore different CTC-based distillation variants, focusing on blank token handling. We show that common approaches like blank elimination do not always work off the shelf. We explore new blank selection patterns as a potential sweet spot between standard knowledge distillation and blank elimination mechanisms. Through the introduction of a symmetric selection method, we are able to remove the CTC loss during knowledge distillation with minimal to no performance degradation. With this, we make the training independent from target labels, potentially allowing for distillation on untranscribed audio data.
Problem

Research questions and friction points this paper is trying to address.

Optimizing blank token handling in CTC-based knowledge distillation
Reducing CTC loss impact while maintaining model performance
Enabling distillation on untranscribed audio data via symmetric selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

CTC-based distillation with blank token handling
Symmetric blank selection method introduced
Training independent from target labels
🔎 Similar Papers
No similar papers found.