Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Low-resource automatic speech recognition (ASR) suffers from data scarcity and poor cross-lingual generalization; existing adaptation methods often rely on external resources, incur high computational costs, or lack support for test-time adaptation. To address these limitations, this work introduces Meta-In-Context Learning (Meta-ICL) to ASR for the first time, proposing a lightweight, fine-tuning-free adaptation framework. It leverages k-nearest neighbor (k-NN) retrieval to dynamically identify semantically similar speech samples and construct task-aware in-context examples, enabling zero-shot or few-shot cross-lingual transfer for Whisper. Evaluated on the ML-SUPERB multilingual benchmark, our approach substantially reduces character error rates (CER) for low-resource languages, achieving an average improvement of 32.7%. Results demonstrate the method’s effectiveness, efficiency, and scalability—without requiring parameter updates or external linguistic resources.

Technology Category

Application Category

📝 Abstract

This paper presents Meta-Whisper, a novel approach to improve automatic speech recognition (ASR) for low-resource languages using the Whisper model. By leveraging Meta In-Context Learning (Meta-ICL) and a k-Nearest Neighbors (KNN) algorithm for sample selection, Meta-Whisper enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning. Experiments on the ML-SUPERB dataset show that Meta-Whisper significantly reduces the Character Error Rate (CER) for low-resource languages compared to the original Whisper model. This method offers a promising solution for developing more adaptable multilingual ASR systems, particularly for languages with limited resources.

Problem

Research questions and friction points this paper is trying to address.

Improving ASR for low-resource languages with limited data

Overcoming inefficiencies in existing adaptation strategies

Enabling few-shot generalization without target domain fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning for low-resource ASR adaptation

Speech in-context learning without fine-tuning

Few-shot multilingual ASR performance improvement

🔎 Similar Papers

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

2024-09-18arXiv.orgCitations: 0