Meta-Whisper: Speech-Based Meta-ICL for ASR on Low-Resource Languages

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Low-resource automatic speech recognition (ASR) suffers from data scarcity and poor cross-lingual generalization; existing adaptation methods often rely on external resources, incur high computational costs, or lack support for test-time adaptation. To address these limitations, this work introduces Meta-In-Context Learning (Meta-ICL) to ASR for the first time, proposing a lightweight, fine-tuning-free adaptation framework. It leverages k-nearest neighbor (k-NN) retrieval to dynamically identify semantically similar speech samples and construct task-aware in-context examples, enabling zero-shot or few-shot cross-lingual transfer for Whisper. Evaluated on the ML-SUPERB multilingual benchmark, our approach substantially reduces character error rates (CER) for low-resource languages, achieving an average improvement of 32.7%. Results demonstrate the method’s effectiveness, efficiency, and scalability—without requiring parameter updates or external linguistic resources.

Technology Category

Application Category

📝 Abstract
This paper presents Meta-Whisper, a novel approach to improve automatic speech recognition (ASR) for low-resource languages using the Whisper model. By leveraging Meta In-Context Learning (Meta-ICL) and a k-Nearest Neighbors (KNN) algorithm for sample selection, Meta-Whisper enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning. Experiments on the ML-SUPERB dataset show that Meta-Whisper significantly reduces the Character Error Rate (CER) for low-resource languages compared to the original Whisper model. This method offers a promising solution for developing more adaptable multilingual ASR systems, particularly for languages with limited resources.
Problem

Research questions and friction points this paper is trying to address.

Improving ASR for low-resource languages with limited data
Overcoming inefficiencies in existing adaptation strategies
Enabling few-shot generalization without target domain fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning for low-resource ASR adaptation
Speech in-context learning without fine-tuning
Few-shot multilingual ASR performance improvement
🔎 Similar Papers
No similar papers found.
M
Ming-Hao Hsu
Electrical Engineering, National Taiwan University, Taipei, Taiwan
K
Kuan-Po Huang
Computer Science and Information Engineering, National Taiwan University, AICS ASUS, Taipei, Taiwan
Hung-yi Lee
Hung-yi Lee
National Taiwan University
deep learningspoken language understandingspeech processing