Analyzing the Importance of Blank for CTC-Based Knowledge Distillation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

136K/year

🤖 AI Summary

In knowledge distillation for CTC-based automatic speech recognition (ASR) models, performance degradation arises from improper modeling of the blank token and reliance on ground-truth transcripts and the CTC loss. Method: This paper systematically identifies the critical role of the blank token in distillation and proposes an alignment-aware blank selection mechanism. By introducing a symmetric blank selection strategy, it enables end-to-end audio-only distillation—eliminating dependence on both transcript annotations and the CTC loss. The approach integrates blank token reweighting into an unsupervised distillation framework. Contribution/Results: Evaluated on LibriSpeech and AISHELL benchmarks, the method achieves ASR accuracy comparable to supervised distillation while enabling purely text-free audio compression. It significantly improves inference speed and establishes a new paradigm for efficient, low-dependency ASR model compression.

Technology Category

Application Category

📝 Abstract

With the rise of large pre-trained foundation models for automatic speech recognition new challenges appear. While the performance of these models is good, runtime and cost of inference increases. One approach to make use of their strength while retaining efficiency is to distill their knowledge to smaller models during training. In this work, we explore different CTC-based distillation variants, focusing on blank token handling. We show that common approaches like blank elimination do not always work off the shelf. We explore new blank selection patterns as a potential sweet spot between standard knowledge distillation and blank elimination mechanisms. Through the introduction of a symmetric selection method, we are able to remove the CTC loss during knowledge distillation with minimal to no performance degradation. With this, we make the training independent from target labels, potentially allowing for distillation on untranscribed audio data.

Problem

Research questions and friction points this paper is trying to address.

Optimizing blank token handling in CTC-based knowledge distillation

Reducing CTC loss impact while maintaining model performance

Enabling distillation on untranscribed audio data via symmetric selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

CTC-based distillation with blank token handling

Symmetric blank selection method introduced

Training independent from target labels

🔎 Similar Papers

No similar papers found.