Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Beam search has long dominated decoding in automatic speech recognition (ASR) and speech translation (ST), yet it often yields suboptimal outputs due to its greedy, locally optimal nature. Method: This work systematically evaluates sample-based Minimum Bayes Risk (MBR) decoding—applied post-hoc via posterior sampling—on ASR and ST tasks using Whisper and its multilingual variants, conducting offline experiments on English–Japanese bidirectional ASR and ST, with evaluation via BLEU, WER, and related metrics. Contribution/Results: MBR consistently improves translation quality and recognition accuracy across most settings, demonstrating superior robustness under low-resource conditions and acoustic noise. Crucially, it requires no architectural modifications to the underlying model and scales efficiently with sampling. To our knowledge, this is the first study to validate MBR’s consistent, cross-task gains for multilingual speech-to-text generation within a unified framework. The results establish MBR as a lightweight, high-accuracy alternative to beam search for offline speech understanding.

Technology Category

Application Category

📝 Abstract

Recent work has shown that sample-based Minimum Bayes Risk (MBR) decoding outperforms beam search in text-to-text generation tasks, such as machine translation, text summarization, and image captioning. On the other hand, beam search is the current practice for speech-to-text tasks such as automatic speech recognition (ASR) and Speech Translation (ST). Given that MBR decoding is effective in text-to-text generation tasks, it is reasonable to expect it to also be effective for speech-to-text tasks. In this paper, we evaluate MBR decoding for ASR and ST tasks on English and Japanese using Whisper and its derivative models. We observe that the accuracy of MBR decoding outperforms that of beam search in most of the experimental settings we have evaluated. The results show that MBR decoding is a promising method for offline ASR and ST tasks that require high accuracy. The code is available at https://github.com/CyberAgentAILab/mbr-for-asr

Problem

Research questions and friction points this paper is trying to address.

Evaluating MBR decoding for speech recognition tasks

Comparing MBR with beam search in ASR performance

Assessing MBR effectiveness for speech-to-text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

MBR decoding replaces beam search in ASR

MBR improves speech recognition and translation accuracy

MBR is effective for offline high-accuracy speech tasks

🔎 Similar Papers

No similar papers found.